A processor is a package of online prediction logic. Elastic Algorithm Service (EAS) of Machine Learning Platform for AI provides built-in processors, which are commonly used to deploy models. Using these built-in processors can help you reduce the expenses on developing the online prediction logic of models.

The following table describes the names and codes of the processors provided by EAS. If you use the EASCMD client to deploy a model, a processor code is required.
Processor name Processor code (only required when EASCMD is used) References
CPU edition GPU edition
PMML pmml None PMML Processor
TensorFlow1.12 tensorflow_cpu_1.12 tensorflow_gpu_1.12 TensorFlow1.12 Processor
TensorFlow1.14 tensorflow_cpu_1.14 tensorflow_gpu_1.14 TensorFlow1.14 Processor
TensorFlow1.15 tensorflow_cpu_1.15 tensorflow_gpu_1.15 TensorFlow1.15 processor with a built-in optimization engine based on PAI-Blade of the agility edition
TensorFlow2.3 tensorflow_cpu_2.3 None TensorFlow2.3 Processor
PyTorch1.6 pytorch_cpu_1.6 pytorch_gpu_1.6 PyTorch1.6 processor with a built-in optimization engine based on PAI-Blade of the agility edition
Caffe caffe_cpu caffe_gpu Caffe Processor
Parameter server algorithm parameter_sever None None
Alink alink_pai_processor None None
xNN xnn_cpu None None
EasyVision easy_vision_cpu_tf1.12_torch151 easy_vision_gpu_tf1.12_torch151 EasyVision Processor
EasyNLP easy_nlp_cpu_tf1.12 easy_nlp_gpu_tf1.12 EasyNLP Processor
Processor None easy_nlp_with_transformer_gpu EasyNLP_with_Transformer Processor
Blade blade_cpu blade_cuda10.0_beta None
MediaFlow None mediaflow MediaFlow Processor
Triton None triton Triton Processor

PMML Processor

You can export traditional machine learning models that are trained in Machine Learning Studio as Predictive Model Markup Language (PMML) files. To export a model in Machine Learning Studio as a PMML file, perform the following operations:
  1. If the model is not trained, choose Settings > General in the left-side navigation pane of Machine Learning Studio and select Auto Generate PMML.
  2. If the model is trained, right-click the model training node on the canvas and choose Model Option > Export PMML.
Note In Machine Learning Studio, models that use the following algorithms can be exported as PMML files: Gradient Boosting Decision Tree (GBDT) for binary classification, Support Vector Machine (SVM), logistic regression for binary classification, logistic regression for multiclass classification, random forest, k-means clustering, linear regression, GBDT regression, and scorecard training.
The built-in PMML processor in EAS provides the following features:
  • Loads a model service from a PMML file.
  • Processes requests that are sent to call the model service.
  • Uses the model to calculate the request results and returns the results to clients.
The PMML processor provides a default policy to impute missing values. If the isMissing policy is not specified for the feature columns in the PMML file, the following values are imputed by default.
DataType Default imputed value
BOOLEAN false
DOUBLE 0.0
FLOAT 0.0
INT 0
STRING ""
You can deploy a model from a PMML file by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to PMML. For more information, see Upload and deploy models in the console.

  • Use Machine Learning Studio to deploy the model

    For more information, see Use Machine Learning Studio to deploy models.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to pmml. The following code block shows an example:
    {
      "processor": "pmml",
      "generate_token": "true",
      "model_path": "http://xxxxx/lr.pmml",
      "name": "eas_lr_example",
      "metadata": {
        "instance": 1,
        "cpu": 1 # Allocate 4 GB memory for each CPU. One CPU and 4 GB memory are considered one quota. 
      }
    }
  • Use Data Science Workshop (DSW) to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Deploy models.

TensorFlow1.12 Processor

The TensorFlow1.12 processor that EAS provides can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. You must convert a Keras or Checkpoint model to a SavedModel model before you can deploy the model. For more information, see Export TensorFlow models in the SavedModel format.
Note The general-purpose processor does not support custom TensorFlow operations.
You can deploy a TensorFlow model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to TensorFlow1.12. For more information, see Upload and deploy models in the console.

  • Use Machine Learning Studio to deploy the model

    For more information, see Use Machine Learning Studio to deploy models.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.12 or tensorflow_gpu_1.12 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.12",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Deploy models.

TensorFlow1.14 Processor

The TensorFlow1.14 processor that EAS provides can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. You must convert a Keras or Checkpoint model to a SavedModel model before you can deploy the model. For more information, see Export TensorFlow models in the SavedModel format.
Note The general-purpose processor does not support custom TensorFlow operations.
You can deploy a TensorFlow model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to TensorFlow1.14. For more information, see Upload and deploy models in the console.

  • Use Machine Learning Studio to deploy the model

    For more information, see Use Machine Learning Studio to deploy models.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.14 or tensorflow_gpu_1.14 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.14",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Deploy models.

TensorFlow1.15 processor with a built-in optimization engine based on PAI-Blade of the agility edition

The TensorFlow1.15 processor that EAS provides can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. You must convert a Keras or Checkpoint model to a SavedModel model before you can deploy the model. For more information, see Export TensorFlow models in the SavedModel format.
Note
  • The general-purpose processor does not support custom TensorFlow operations.
  • TensorFlow1.15 processor provides a built-in optimization engine based on PAI-Blade of the agility edition. You can use this processor to deploy TensorFlow models that are optimized by PAI-Blade of the agility edition.
You can deploy a TensorFlow model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to TensorFlow1.15. For more information, see Upload and deploy models in the console.

  • Use Machine Learning Studio to deploy the model

    For more information, see Use Machine Learning Studio to deploy models.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.15 or tensorflow_gpu_1.15 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.15",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Deploy models. For more information about the parameters in the service configuration file, see Create a service.

TensorFlow2.3 Processor

The TensorFlow2.3 processor that EAS provides can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. You must convert a Keras or Checkpoint model to a SavedModel model before you can deploy the model. For more information, see Export TensorFlow models in the SavedModel format.
Note The general-purpose processor does not support custom TensorFlow operations.
You can deploy a TensorFlow model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to TensorFlow2.3. For more information, see Upload and deploy models in the console.

  • Use Machine Learning Studio to deploy the model

    For more information, see Use Machine Learning Studio to deploy models.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to mediaflow.The following code block shows an example:
    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_2.3",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. The method is similar to the method of deploying models by using the EASCMD client. For more information, see Deploy models.

PyTorch1.6 processor with a built-in optimization engine based on PAI-Blade of the agility edition

The PyTorch1.6 processor that EAS provides can load models in the TorchScript format. For more information, see TorchScript.
Note
  • The general-purpose processor does not support PyTorch extensions. You cannot use this processor to import or export models other than TensorFlow models.
  • The PyTorch1.6 processor provides a built-in optimization engine based on PAI-Blade of the agility edition. You can use this processor to deploy PyTorch models that are optimized by PAI-Blade of the agility edition.
You can deploy a TorchScript model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to PyTorch1.6. For more information, see Upload and deploy models in the console.

  • Use Machine Learning Studio to deploy the model

    For more information, see Use Machine Learning Studio to deploy models.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to pytorch_cpu_1.6 or pytorch_gpu_1.6 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
    {
      "name": "pytorch_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/torchscript_model.pt",
      "processor": "pytorch_gpu_1.6",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 1,
        "cuda": "10.0",
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Deploy models. For more information about the parameters in the service configuration file, see Create a service.

Caffe Processor

The Caffe processor that EAS provides can load deep learning models that are trained based on the Caffe framework. Due to the flexibility of the Caffe framework, you must specify the names of the model file and weight file in the model package when you deploy a Caffe model.
Note The general-purpose processor does not support custom data layers.
You can deploy a Caffe model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to Caffe. For more information, see Upload and deploy models in the console.

  • Use Machine Learning Studio to deploy the model

    For more information, see Use Machine Learning Studio to deploy models.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to caffe_cpu or caffe_gpu based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. The following code block shows an example:
    {
      "name": "caffe_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/caffe_model.zip",
      "processor": "caffe_cpu",
      "model_config": {
        "model": "deploy.prototxt",
        "weight": "bvlc_reference_caffenet.caffemodel"
      },
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Deploy models.

EasyNLP Processor

The EasyNLP processor that EAS provides can load deep learning natural language processing (NLP) models that are trained based on the EasyTransfer framework.

You can deploy an EasyTransfer model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to EasyNLP. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to easy_nlp_cpu_tf1.12 or easy_nlp_gpu_tf1.12 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. Set the type parameter in the model_config section to the type of model that is trained. The following code block shows an example. For more information about other parameters, see Create a service.
    • Deploy the model on a GPU node
      {
        "name": "ev_app_demo",
        "generate_token": "true",
        "model_path": "http://xxxxx/your_model.zip",
        "processor": "easy_nlp_gpu_tf1.12",
        "model_config": "{\"type\":\"text_classify_bert\"}",
        "metadata": {
          "resource": "your_resource_name",
          "cuda": "9.0",
          "instance": 1,
          "memory": 4000,
          "gpu": 1,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }
    • Deploy the model on a CPU node
      {
        "name": "easynlp_serving_test",
        "generate_token": "true",
        "model_path": "http://xxxxx/your_model.zip",
        "processor": "easy_nlp_cpu_tf1.12",
        "model_config": "{\"type\":\"text_classify_bert\"}",
        "metadata": {
          "resource": "your_resource_name",
          "instance": 1,
          "gpu": 0,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }

EasyNLP_with_Transformer Processor

The EasyNLP_with_Transformer processor that EAS provides can load transformer models that are trained based on the EasyTexMiner framework.

You can deploy a transformer model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to easy_nlp_with_transformer gpu. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to the corresponding processor code, which is easy_nlp_with_transformer_gpu. Then, set the type parameter in the model_config section to the type of model that is trained. The following code block shows an example. For more information about other parameters, see Create a service.
    {
      "name": "news_title_zh",
      "generate_token": "true",
      "processor": "easy_nlp_with_transformer_gpu",
      "model_path": "path-to-model/model_best_from_zqkd.tar.gz",
      "model_config": "{\"type\":\"news_title_generation\"}",
      "metadata": {
        "eas.handlers.disable_failure_handler": true,
        "resource": "eas-r-6krxn2f5pjt5mt****",
        "cpu": 4,
        "instance": 1,
        "cuda": "9.0",
        "memory": 8192,
        "gpu_memory": 8
      }
    }

EasyVision Processor

The EasyVision processor that EAS provides can load deep learning models that are trained based on the EasyVision framework.

You can deploy an EasyVision model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to EasyVision. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to easy_vision_cpu_tf1.12_torch151 or easy_vision_gpu_tf1.12_torch151 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. Set the type parameter in the model_config section to the type of the model that is trained. The following code block shows an example. For more information about other parameters, see Create a service.
    • Deploy the model on a GPU node
      {
        "name": "ev_app_demo",
        "processor": "easy_vision_gpu_tf1.12_torch151",
        "model_path": "oss://path/to/your/model",
        "model_config": "{\"type\":\"classifier\"}",
        "metadata": {
          "resource": "your_resource_name",
          "cuda": "9.0",
          "instance": 1,
          "memory": 4000,
          "gpu": 1,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }
    • Deploy the model on a CPU node
      {
        "name": "ev_app_cpu_demo",
        "processor": "easy_vision_cpu_tf1.12_torch151",
        "model_path": "oss://path/to/your/model",
        "model_config": "{\"type\":\"classifier\"}",
        "metadata": {
          "resource": "your_resource_name",
          "instance": 1,
          "memory": 4000,
          "gpu": 0,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }

MediaFlow Processor

The MediaFlow processor that EAS provides is a general-purpose orchestration engine that can analyze and process video, audio, and images.

You can deploy a MediaFlow model by using one of the following methods:
  • Upload the model file to the console

    Set the Processor Type parameter to MediaFlow. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model
    In the service.json service configuration file, set the processor parameter to mediaflow. In addition, you must set the following parameters if you use the MediaFlow processor to deploy models. For more information about other parameters, see Create a service.
    • graph_pool_size: the number of graph pools.
    • worker_threads: the number of worker threads.
    The following code block shows an example:
    • Deploy a model for video classification
      {
        "model_entry": "video_classification/video_classification_ext.js", 
        "name": "video_classification", 
        "model_path": "oss://path/to/your/model", 
        "generate_token": "true", 
        "processor": "mediaflow", 
        "model_config" : {
            "graph_pool_size":8,
            "worker_threads":16
        },
        "metadata": {
          "eas.handlers.disable_failure_handler" :true,
          "resource": "your_resource_name", 
            "rpc.worker_threads": 30,
            "rpc.enable_jemalloc": true,
          "rpc.keepalive": 500000, 
          "cpu": 4, 
          "instance": 1, 
          "cuda": "9.0", 
          "rpc.max_batch_size": 64, 
          "memory": 10000, 
          "gpu": 1 
        }
      }
    • Deploy a model for automated speech recognition (ASR)
      {
        "model_entry": "asr/video_asr_ext.js", 
        "name": "video_asr", 
        "model_path": "oss://path/to/your/model", 
        "generate_token": "true", 
        "processor": "mediaflow", 
        "model_config" : {
            "graph_pool_size":8,
            "worker_threads":16
        },
        "metadata": {
          "eas.handlers.disable_failure_handler" :true,
          "resource": "your_resource_name", 
            "rpc.worker_threads": 30,
            "rpc.enable_jemalloc": true,
          "rpc.keepalive": 500000, 
          "cpu": 4, 
          "instance": 1, 
          "cuda": "9.0", 
          "rpc.max_batch_size": 64, 
          "memory": 10000, 
          "gpu": 1 
        }
      }
    In the service.json service configuration file, the values of the model_entry, name, and model_path parameters vary for video classification and ASR. You must modify the parameters based on the purpose of the model.

Triton Processor

Triton Inference Server is a new-generation online service framework released by NVIDIA. Triton Inference Server simplifies the deployment and management of GPU-accelerated models and complies with the API standards of KFServing. In addition, Triton Inference Server has the following features:
  • Supports multiple open source frameworks such as TensorFlow, PyTorch, ONNX Runtime, TensorRT, and custom framework backends.
  • Concurrently runs multiple models on one GPU to maximize GPU utilization.
  • Supports the HTTP and gRPC protocols and allows you to send requests in binary format to reduce the request size.
  • Supports the dynamic batching feature to improve service throughput.
EAS provides a built-in Triton processor.
Note
  • The Triton processor is in public preview in the China (Shanghai) region. The processor is unavailable in other regions.
  • The models that are deployed by using the Triton processor must be stored in Object Storage Service (OSS). Therefore, you must activate OSS and upload model files to OSS before you can use the Triton processor to deploy models. For more information about how to upload objects to OSS, see Upload objects.
  • Only exclusive resource groups in EAS support the Triton processor.
The following content describes how to use the Triton processor to deploy a model as a service and how to call the service:
  • Use the Triton processor to deploy a model
    You can use the Triton processor to deploy models only by using the EASCMD client. For more information about how to use the EASCMD client to deploy models, see Create a service. In the service.json service configuration file, set the processor parameter to triton. In addition, you must set the parameters related to OSS so that the Triton processor can obtain model files from OSS. The following code block shows how to modify the service.json service configuration file:
    {
      "name": "triton_test",                          
      "processor": "triton",
      "processor_params": [
        "--model-repository=oss://triton-model-repo/models", 
        "--allow-http=true", 
      ],
      "metadata": {
        "instance": 1,
        "cpu": 4,
        "gpu": 1,
        "memory": 10000,
        "resource":"<your resource id>"
      }
    }
    The following table describes the parameters that are required if you use the Triton processor to deploy models. For more information about other parameters, see Parameters in the service.json file.
    Parameter Description
    processor_params The parameters that you want to pass to Triton Inference Server when the deployment starts. Parameters that are not supported are automatically filtered out by Triton Inference Server. The following Table 1 table describes the parameters that can be passed to Triton Inference Server. The model-repository parameter is required. For more information about optional parameters, see main.cc.
    oss_endpoint The OSS endpoint. If you do not specify an endpoint, the system automatically uses the OSS service in the region where the EAS service is deployed. If you want to use the OSS service that is activated in another region, you must set this parameter. For more information about the valid values of this parameter, see Regions and endpoints.
    metadata resource The ID of the exclusive resource group that is used to deploy the model in EAS. If you want to deploy a model by using the Triton processor, the resources to be used must belong to the exclusive resource group in EAS. For more information about how to create an exclusive resource group in EAS, see Dedicated resource groups.
    Table 1. Parameters that can be passed to Triton Inference Server
    Parameter Required Description
    model-repository Yes The OSS path of the model. You must set the model-repository parameter to a subdirectory of an OSS bucket instead of the root directory of the OSS bucket.

    For example, you can set the parameter to oss://triton-model-repo/models. triton-model-repo is the name of the OSS bucket, and models is a subdirectory of the OSS bucket.

    log-verbose No For more information, see main.cc.
    log-info No
    log-warning No
    log-error No
    exit-on-error No
    strict-model-config No
    strict-readiness No
    allow-http No
    http-thread-count No
    pinned-memory-pool-byte-size No
    cuda-memory-pool-byte-size No
    min-supported-compute-capability No
    buffer-manager-thread-count No
    backend-config No
  • Use the official Triton client to call the service deployed by using the Triton processor
    Before you use the Triton client for Python to call the deployed service, run the following commands to install the official Triton client:
    pip3 install nvidia-pyindex
    pip3 install tritonclient[all]
    Run the following command to download a test image to the current directory:
    wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/doc-assets/cat.png
    The following code block shows that the Triton client for Python sends a request in binary format to the service that is deployed by using the Triton processor:
    import numpy as np
    import time
    from PIL import Image
    
    import tritonclient.http as httpclient
    from tritonclient.utils import InferenceServerException
    
    URL = "<servcice url>"  # Replace <servcice url> with the endpoint of the deployed service. 
    HEADERS = {"Authorization": "<service token>"} # Replace <service token> with the token that is used to access the service. 
    input_img = httpclient.InferInput("input", [1, 299, 299, 3], "FP32")
    img = Image.open('./cat.png').resize((299, 299))
    img = np.asarray(img).astype('float32') / 255.0
    input_img.set_data_from_numpy(img.reshape([1, 299, 299, 3]), binary_data=True)
    
    output = httpclient.InferRequestedOutput(
        "InceptionV3/Predictions/Softmax", binary_data=True
    )
    triton_client = httpclient.InferenceServerClient(url=URL, verbose=False)
    
    start = time.time()
    for i in range(10):
        results = triton_client.infer(
            "inception_graphdef", inputs=[input_img], outputs=[output], headers=HEADERS
        )
        res_body = results.get_response()
        elapsed_ms = (time.time() - start) * 1000
        if i == 0:
            print("model name: ", res_body["model_name"])
            print("model version: ", res_body["model_version"])
            print("output name: ", res_body["outputs"][0]["name"])
            print("output shape: ", res_body["outputs"][0]["shape"])
        print("[{}] Avg rt(ms): {:.2f}".format(i, elapsed_ms))
        start = time.time()