All Products
Search
Document Center

Platform For AI:Built-in processors

Last Updated:Mar 06, 2026

A processor is a package containing online prediction logic. Elastic Algorithm Service (EAS) provides several common built-in processors for deploying standard models and reducing development costs for online prediction logic.

EAS provides the processors described in the following table. When using EASCMD to deploy a service, specify a processor code.

Processor name

Processor code (for EASCMD deployment only)

Documentation

CPU version

GPU version

EasyRec

easyrec-2.4

easyrec-2.4

EasyRec processor

TorchEasyRec

easyrec-torch-1.0

easyrec-torch-1.0

TorchEasyRec processor

PMML

pmml

None

PMML processor

TensorFlow 1.12

tensorflow_cpu_1.12

tensorflow_gpu_1.12

TensorFlow 1.12 processor

TensorFlow 1.14

tensorflow_cpu_1.14

tensorflow_gpu_1.14

TensorFlow 1.14 processor

TensorFlow 1.15

tensorflow_cpu_1.15

tensorflow_gpu_1.15

TensorFlow 1.15 processor (with a built-in PAI-Blade agile optimization engine)

TensorFlow 2.3

tensorflow_cpu_2.3

None

TensorFlow 2.3 processor

PyTorch 1.6

pytorch_cpu_1.6

pytorch_gpu_1.6

PyTorch 1.6 processor (with a built-in PAI-Blade agile optimization engine)

Caffe

caffe_cpu

caffe_gpu

Caffe processor

PS algorithm

parameter_sever

None

PS algorithm processor

Alink

alink_pai_processor

None

None

xNN

xnn_cpu

None

None

EasyVision

easy_vision_cpu_tf1.12_torch151

easy_vision_gpu_tf1.12_torch151

EasyVision processor

EasyTransfer

easytransfer_cpu

easytransfer_gpu

EasyTransfer processor

EasyNLP

easynlp

easynlp

EasyNLP processor

EasyCV

easycv

easycv

EasyCV processor

Blade

blade_cpu

blade_cuda10.0_beta

None

MediaFlow

None

mediaflow

MediaFlow processor

Triton

None

triton

Triton processor

PMML processor

The built-in PMML processor in EAS performs these operations:

  • Load a PMML model file as a service.

  • Process requests sent to the model service.

  • Calculate request results using the model and return the results to clients.

The PMML processor provides a default policy to handle missing values. If the isMissing policy is not specified for feature fields in the PMML model file, the system uses these default values for padding.

Data type

Default padding value

BOOLEAN

false

DOUBLE

0.0

FLOAT

0.0

INT

0

STRING

""

Deploy a PMML model in any of these ways:

  • Upload in the console

    Set Processor Type to PMML. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is pmml. The following code provides an example.

    {
      "processor": "pmml",
      "generate_token": "true",
      "model_path": "http://xxxxx/lr.pmml",
      "name": "eas_lr_example",
      "metadata": {
        "instance": 1,
        "cpu": 1 # 4 GB of memory is automatically allocated to each CPU. This is called 1 Quota.
      }
    }
  • DSW deployment

    This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).

TensorFlow 1.12 processor

The TensorFlow 1.12 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.

Note

This processor does not support custom TensorFlow OPs.

Deploy a TensorFlow model in any of these ways:

  • Upload in the console

    Set Processor Type to TensorFlow1.12. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_1.12 or tensorflow_gpu_1.12. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.12",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Deploy using DSW

    This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).

TensorFlow 1.14 processor

The TensorFlow 1.14 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.

Note

This processor does not support custom TensorFlow OPs.

Deploy a TensorFlow model in any of these ways:

  • Upload in the console

    Set Processor Type to TensorFlow1.14. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_1.14 or tensorflow_gpu_1.14. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.14",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • DSW Deployment

    This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).

TensorFlow 1.15 processor (with a built-in PAI-Blade agile optimization engine)

The TensorFlow 1.15 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.

Note
  • This processor does not support custom TensorFlow OPs.

  • This processor has the built-in PAI-Blade agile optimization engine. Use it to deploy TensorFlow models optimized by PAI-Blade agile optimization engine.

Deploy a TensorFlow model in any of these ways:

  • Upload in the console

    Set Processor Type to TensorFlow1.15. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_1.15 or tensorflow_gpu_1.15. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.15",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • DSW deployment

    This method is similar to deploying using a local client. You need to edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD). For more information about the parameters in the configuration file, see Create a service.

TensorFlow 2.3 processor

The TensorFlow 2.3 processor provided by EAS can load TensorFlow models in SavedModel (recommended) or SessionBundle format. Convert Keras and Checkpoint models to SavedModel format before deploying them. For more information, see TensorFlow FAQ.

Note

This processor does not support custom TensorFlow OPs.

Deploy a TensorFlow model in any of these ways:

  • Upload in the console

    Set Processor Type to TensorFlow2.3. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is tensorflow_cpu_2.3. The following code provides an example.

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_2.3",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • DSW deployment

    This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).

PyTorch 1.6 processor (with a built-in PAI-Blade agile optimization engine)

The PyTorch 1.6 processor provided by EAS can load models in TorchScript format. For more information, see the official TorchScript documentation.

Note
  • This processor does not support PyTorch extensions or model inputs and outputs not of tensor type.

  • This processor has the built-in PAI-Blade agile optimization engine and can be used to deploy optimized PyTorch models.

Deploy a TorchScript model in any of these ways:

  • Upload in the console

    Set Processor Type to PyTorch1.6. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is pytorch_cpu_1.6 or pytorch_gpu_1.6. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.

    {
      "name": "pytorch_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/torchscript_model.pt",
      "processor": "pytorch_gpu_1.6",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 1,
        "cuda": "10.0",
        "memory": 2000
      }
    }
  • DSW Deployment

    This method is similar to deploying using a local client. You need to edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD). For more information about the parameters in the configuration file, see Create a service.

Caffe processor

The Caffe processor provided by EAS can load deep learning models trained using the Caffe framework. Because the Caffe framework is flexible, specify the names of model and weight files in the model package when deploying a Caffe model.

Note

This processor does not support custom data layers.

Deploy a Caffe model in any of these ways:

  • Upload in the console

    Set Processor Type to Caffe. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is caffe_cpu or caffe_gpu. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. The following code provides an example.

    {
      "name": "caffe_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/caffe_model.zip",
      "processor": "caffe_cpu",
      "model_config": {
        "model": "deploy.prototxt",
        "weight": "bvlc_reference_caffenet.caffemodel"
      },
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • DSW Deployment

    This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).

PS algorithm processor

The PS algorithm processor provided by EAS can load models in PS format.

This section describes how to use the PS algorithm processor to deploy a model service and send service requests.

  • Deploy a model in PS format in any of these ways:

    • Upload in the console

      Set Processor Type to PS Algorithm. For more information, see Custom deployment.

    • Deploy using a local client

      In the service.json configuration file, set the processor field to the corresponding processor code, which is parameter_sever. The following code provides an example.

      {
        "name":"ps_smart",
        "model_path": "oss://examplebucket/xlab_m_pai_ps_smart_b_1058272_v0.tar.gz",
        "processor": "parameter_sever",
        "metadata": {
          "region": "beijing",
          "cpu": 1,
          "instance": 1,
          "memory": 2048
        }
      }
    • DSW deployment

      This method is similar to deploying using a local client. Edit the service.json configuration file. For more information, see Deploy a service using a local client (EASCMD).

  • Description

    Single and batch prediction requests are supported. The request data structure is the same for both types of requests and is a JSON array that contains feature objects.

    • Example of a single request

      curl "http://eas.location/api/predict/ps_smart" -d "[
                  {
                      "f0": 1,
                      "f1": 0.2,
                      "f3": 0.5
                  }
      ]"
    • Request examples

      curl "http://eas.location/api/predict/ps_smart" -d "[
              {
                  "f0": 1,
                  "f1": 0.2,
                  "f3": 0.5
              },
              {
                  "f0": 1,
                  "f1": 0.2,
                  "f3": 0.5
              }
      ]"
    • Return value

      The return value format is the same for single and batch requests. The return value is an array that contains return objects. The position of each return object corresponds to the position of the data in the request.

      [
        {
          "label":"xxxx",
          "score" : 0.2,
          "details" : [{"k1":0.3}, {"k2":0.5}]
        },
        {
          "label":"xxxx",
          "score" : 0.2,
          "details" : [{"k1":0.3}, {"k2":0.5}]
        }
      ]

EasyTransfer processor

The EasyTransfer processor provided by EAS can load TensorFlow-based deep learning NLP models trained using the EasyTransfer framework.

Deploy an EasyTransfer model in any of these ways:

  • Upload in the console

    Set Processor Type to EasyTransfer. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is easytransfer_cpu or easytransfer_gpu. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. In the model_config section, specify the model type in the type field. The following code provides an example for a text classification model. For more information about other parameters, see Create a service.

    • Example configuration for a GPU-based deployment (using a public resource group)

      {
        "name": "et_app_demo",
        "metadata": {
          "instance": 1
        },
        "cloud": {
          "computing": {
            "instance_type": "ecs.gn6i-c4g1.xlarge"
          }
        },
        "model_path": "http://xxxxx/your_model.zip",
        "processor": "easytransfer_gpu",
        "model_config": {
          "type": "text_classify_bert"
        }
      }
    • Example configuration for a CPU-based deployment

      {
        "name": "et_app_demo",
        "model_path": "http://xxxxx/your_model.zip",
        "processor": "easytransfer_cpu",
        "model_config": {
          "type":"text_classify_bert"
        },
        "metadata": {
          "instance": 1,
          "cpu": 1,
          "memory": 4000
        }
      }

    The supported task types are listed in the following table.

    Task type

    type

    Text matching

    text_match_bert

    Text classification

    text_classify_bert

    Sequence labeling

    sequence_labeling_bert

    Text embedding

    vectorization_bert

EasyNLP processor

The EasyNLP processor provided by EAS can load PyTorch-based deep learning NLP models trained using the EasyNLP framework.

Deploy an EasyNLP model in any of these ways:

  • Upload in the console

    Set Processor Type to EasyNLP . For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is easynlp. In the model_config section, specify the model type in the type field. The following code provides an example for a single-label text classification model. For more information about other parameters, see Create a service.

    {
      "name": "easynlp_app_demo",
      "metadata": {
        "instance": 1
      },
      "cloud": {
        "computing": {
          "instance_type": "ecs.gn6i-c4g1.xlarge"
        }
      },
      "model_config": {
        "app_name": "text_classify",
        "type": "text_classify"
      },
      "model_path": "http://xxxxx/your_model.tar.gz",
      "processor": "easynlp"
    }

    The supported task types are listed in the following table.

    Task type

    type

    Text classification (single-label)

    text_classify

    Text classification (multi-label)

    text_classify_multi

    Text matching

    text_match

    Sequence labeling

    sequence_labeling

    Text embedding

    vectorization

    Chinese summary generation (GPU)

    sequence_generation_zh

    English summary generation (GPU)

    sequence_generation_en

    Machine reading comprehension (Chinese)

    machine_reading_comprehension_zh

    Machine reading comprehension (English)

    machine_reading_comprehension_en

    WUKONG_CLIP (GPU)

    wukong_clip

    CLIP (GPU)

    clip

After the service is deployed, on the Elastic Algorithm Service (EAS) page, find the service to call and click View Endpoint Information in the Service Type column. View the service endpoint and token required for authentication. Call the service using the following Python code.

import requests
# Replace <eas-service-url> with the endpoint of your service.
url = '<eas-service-url>'
# Replace <eas-service-token> with the token of your service.
token = '<eas-service-token>'
# Specify the data for prediction. The following code provides an example for text classification.
request_body = {
    "first_sequence": "hello"
}
 
headers = {"Authorization": token}
resp = requests.post(url=url, headers=headers, json=request_body)
print(resp.content.decode())

EasyCV processor

The EasyCV processor provided by EAS can load deep learning models trained using the EasyCV framework.

Deploy an EasyCV model in any of these ways:

  • Upload in the console

    Set Processor Type to EasyCV. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is easycv. In the model_config section, specify the model type in the type field. The following code provides an example for an image classification model. For more information about other parameters, see Create a service.

    {
      "name": "easycv_classification_example",
      "processor": "easycv",
      "model_path": "oss://examplebucket/epoch_10_export.pt",
      "model_config": {"type":"TorchClassifier"},
      "metadata": {
        "instance": 1
      },
      "cloud": {
        "computing": {
          "instance_type": "ecs.gn5i-c4g1.xlarge"
        }
      }
    }

    The supported task types are listed in the following table.

    Task type

    model_config

    Image classification

    {"type":"TorchClassifier"}

    Object detection

    {"type":"DetectionPredictor"}

    Semantic segmentation

    {"type":"SegmentationPredictor"}

    YOLOX object detection

    {"type":"YoloXPredictor"}

    Video classification

    {"type":"VideoClassificationPredictor"}

After the service is deployed, on the Elastic Algorithm Service (EAS) page, find the service to call and click View Endpoint Information in the Service Type column. View the service endpoint and token required for authentication. Call the service using the following Python code.

import requests
import base64
import json
resp = requests.get('http://exmaplebucket.oss-cn-zhangjiakou.aliyuncs.com/images/000000123213.jpg')
ENCODING = 'utf-8'
datas = json.dumps( {
            "image": base64.b64encode(resp.content).decode(ENCODING)
            })
# Replace the value with the token that you obtained.
head = {
   "Authorization": "NTFmNDJlM2E4OTRjMzc3OWY0NzI3MTg5MzZmNGQ5Yj***"
}
for x in range(0,10):
  	# Replace the value with the endpoint of the service.
    resp = requests.post("http://150231884461***.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test_easycv_classification_example", data=datas, headers=head)
    print(resp.text)
                            

Encode image or video data in Base64 format for transmission. Use the image keyword for image data and the video keyword for video data.

EasyVision processor

The EasyVision processor provided by EAS can load deep learning models trained using the EasyVision framework.

Deploy an EasyVision model in any of these ways:

  • Upload in the console

    Set Processor Type to EasyVision. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is easy_vision_cpu_tf1.12_torch151 or easy_vision_gpu_tf1.12_torch151. Select the code based on your deployment resources. A deployment error occurs if the specified processor does not match the resources. In the model_config section, specify the model type in the type field. The following code provides an example. For more information about other parameters, see Create a service.

    • Example configuration for a GPU-based deployment

      {
        "name": "ev_app_demo",
        "processor": "easy_vision_gpu_tf1.12_torch151",
        "model_path": "oss://path/to/your/model",
        "model_config": "{\"type\":\"classifier\"}",
        "metadata": {
          "resource": "your_resource_name",
          "cuda": "9.0",
          "instance": 1,
          "memory": 4000,
          "gpu": 1,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }
    • Example configuration for a CPU-based deployment

      {
        "name": "ev_app_cpu_demo",
        "processor": "easy_vision_cpu_tf1.12_torch151",
        "model_path": "oss://path/to/your/model",
        "model_config": "{\"type\":\"classifier\"}",
        "metadata": {
          "resource": "your_resource_name",
          "instance": 1,
          "memory": 4000,
          "gpu": 0,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }

MediaFlow processor

The MediaFlow processor provided by EAS is a universal orchestration engine that can analyze and process videos, audio, and images.

Deploy a MediaFlow model in any of these ways:

  • Upload in the console

    Set Processor Type to MediaFlow. For more information, see Deploy a service by uploading a model in the console.

  • Deploy using a local client

    In the service.json configuration file, set the processor field to the corresponding processor code, which is mediaflow. When using the MediaFlow processor to deploy a model, also add these specific fields to the configuration file. For more information about other fields, see Create a service.

    • graph_pool_size: the number of graph pools.

    • worker_threads: the number of scheduling threads.

    The following is an example.

    • Example configuration for deploying a video classification model

      {
        "model_entry": "video_classification/video_classification_ext.js", 
        "name": "video_classification", 
        "model_path": "oss://path/to/your/model", 
        "generate_token": "true", 
        "processor": "mediaflow", 
        "model_config" : {
            "graph_pool_size":8,
            "worker_threads":16
        },
        "metadata": {
          "eas.handlers.disable_failure_handler" :true,
          "resource": "your_resource_name", 
            "rpc.worker_threads": 30,
            "rpc.enable_jemalloc": true,
          "rpc.keepalive": 500000, 
          "cpu": 4, 
          "instance": 1, 
          "cuda": "9.0", 
          "rpc.max_batch_size": 64, 
          "memory": 10000, 
          "gpu": 1 
        }
      }
    • Configuration for speech recognition (ASR) model deployment

      {
        "model_entry": "asr/video_asr_ext.js", 
        "name": "video_asr", 
        "model_path": "oss://path/to/your/model", 
        "generate_token": "true", 
        "processor": "mediaflow", 
        "model_config" : {
            "graph_pool_size":8,
            "worker_threads":16
        },
        "metadata": {
          "eas.handlers.disable_failure_handler" :true,
          "resource": "your_resource_name", 
            "rpc.worker_threads": 30,
            "rpc.enable_jemalloc": true,
          "rpc.keepalive": 500000, 
          "cpu": 4, 
          "instance": 1, 
          "cuda": "9.0", 
          "rpc.max_batch_size": 64, 
          "memory": 10000, 
          "gpu": 1 
        }
      }

    The main differences in the service.json configuration between speech recognition and video classification are the values of model_entry, name, and model_path fields. Modify these fields based on the type of model to deploy.

Triton processor

Triton Inference Server is an online service framework from NVIDIA. It provides easy-to-use deployment and management interfaces for models on GPUs and is compatible with KFServing API standards. Its features include:

  • Supports deployment of multiple open source frameworks, such as TensorFlow, PyTorch, ONNX Runtime, and TensorRT. Also supports custom service backends.

  • Supports running multiple models on a single GPU at the same time to improve GPU utilization.

  • Supports HTTP and gRPC communication protocols. Also provides a binary format extension to compress sent request size.

  • Supports dynamic batching to improve service throughput.

Triton Inference Server is available on EAS as the built-in Triton processor.

Note
  • The Triton processor is in public preview and is available only in the China (Shanghai) region. Other regions are not supported.

  • All models deployed using the Triton service must be stored in OSS. Therefore, activate OSS and upload your model files to OSS in advance. For more information about how to upload files to OSS, see Simple upload.

This section describes how to use the Triton processor to deploy a model service and how to call the service:

  • Deploy a model service using the Triton processor

    Deploy a Triton model service only using the eascmd client tool. For more information, see Create a service. When deploying the model service, in the service.json configuration file, set the processor field to the corresponding processor code, which is triton. Because Triton retrieves the model from OSS, also configure parameters related to OSS. The following code shows an example of the service.json file.

    {
      "name": "triton_test",                          
      "processor": "triton",
      "processor_params": [
        "--model-repository=oss://triton-model-repo/models", 
        "--allow-http=true", 
      ],
      "metadata": {
        "instance": 1,
        "cpu": 4,
        "gpu": 1,
        "memory": 10000,
        "resource":"<your resource id>"
      }
    }

    The following table describes specific parameters required to deploy a Triton model service. For more information about other general-purpose parameters, see Parameters in service.json.

    Parameter

    Description

    processor_params

    Parameters passed to Triton Server at service startup. Unsupported parameters are automatically filtered. For the set of supported parameters that can be passed to Triton Server, see Parameters that can be passed to Triton Server. The model-repository parameter is required. For more information about other optional parameters, see main.cc.

    oss_endpoint

    The OSS endpoint. If not specified, the system automatically uses the OSS service in the region where the EAS service is deployed. To use an OSS service in a different region, specify this parameter. For information about valid values, see Endpoints.

    metadata

    resource

    The ID of the dedicated EAS resource group used to deploy the model service. When using the Triton processor to deploy a model service, the resources must belong to a dedicated EAS resource group. For more information about how to create a dedicated EAS resource group, see Use EAS resource groups.

    Table 1. Parameters that can be passed to Triton Server

    Parameter

    Required

    Description

    model-repository

    Yes

    The path must be an OSS path. The system does not support using the root directory of a bucket as model-repository. Specify a subdirectory within the bucket.

    For example, oss://triton-model-repo/models, where triton-model-repo is the bucket name and models is a subdirectory in the bucket.

    log-verbose

    No

    For more information about the parameters, see main.cc.

    log-info

    No

    log-warning

    No

    log-error

    No

    exit-on-error

    No

    strict-model-config

    No

    strict-readiness

    No

    allow-http

    No

    http-thread-count

    No

    pinned-memory-pool-byte-size

    No

    cuda-memory-pool-byte-size

    No

    min-supported-compute-capability

    No

    buffer-manager-thread-count

    No

    backend-config

    No

  • Use the native Triton client to call the EAS Triton processor service

    To send a request using a Python client, first run these commands to install the official Triton client.

    pip3 install nvidia-pyindex
    pip3 install tritonclient[all]

    Run the following command to download a test image to the current directory.

    wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/doc-assets/cat.png

    The following example shows how to use a Python client to send a request in binary format to the Triton processor service.

    import numpy as np
    import time
    from PIL import Image
    
    import tritonclient.http as httpclient
    from tritonclient.utils import InferenceServerException
    
    URL = "<service url>"  # Replace <service url> with the endpoint of the service.
    HEADERS = {"Authorization": "<service token>"} # Replace <service token> with the access token of the service.
    input_img = httpclient.InferInput("input", [1, 299, 299, 3], "FP32")
    img = Image.open('./cat.png').resize((299, 299))
    img = np.asarray(img).astype('float32') / 255.0
    input_img.set_data_from_numpy(img.reshape([1, 299, 299, 3]), binary_data=True)
    
    output = httpclient.InferRequestedOutput(
        "InceptionV3/Predictions/Softmax", binary_data=True
    )
    triton_client = httpclient.InferenceServerClient(url=URL, verbose=False)
    
    start = time.time()
    for i in range(10):
        results = triton_client.infer(
            "inception_graphdef", inputs=[input_img], outputs=[output], headers=HEADERS
        )
        res_body = results.get_response()
        elapsed_ms = (time.time() - start) * 1000
        if i == 0:
            print("model name: ", res_body["model_name"])
            print("model version: ", res_body["model_version"])
            print("output name: ", res_body["outputs"][0]["name"])
            print("output shape: ", res_body["outputs"][0]["shape"])
        print("[{}] Avg rt(ms): {:.2f}".format(i, elapsed_ms))
        start = time.time()