Pytorch - Platform For AI - Alibaba Cloud Documentation Center

Elastic Algorithm Service (EAS) of Machine Learning Platform for AI provides built-in PyTorch processors. You can use PyTorch processors to deploy models in TorchScript format supported by PyTorch as online model services. This topic describes how to deploy and call PyTorch model services.

PyTorch processor versions

Different PyTorch processors correspond to different PyTorch versions, including GPU-accelerated versions and CPU-accelerated versions. The following table describes the available PyTorch processors and the corresponding PyTorch versions.


Processor name	PyTorch version	Support for GPU acceleration
pytorch_cpu_1.6	Pytorch 1.6	No
pytorch_cpu_1.7	Pytorch 1.7	No
pytorch_cpu_1.9	Pytorch 1.9	No
pytorch_cpu_1.10	Pytorch 1.10	No
pytorch_gpu_1.6	Pytorch 1.6	Yes
pytorch_gpu_1.7	Pytorch 1.7	Yes
pytorch_gpu_1.9	Pytorch 1.9	Yes
pytorch_gpu_1.10	Pytorch 1.10	Yes

Step 1: Deploy a model service

When you use the EASCMD client to deploy a PyTorch model service, you need to set processor to one of the preceding PyTorch processor names. The following code block shows a sample service configuration file:

{

  "name": "pytorch_resnet_example",
  "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/resnet18.pt",
  "processor": "pytorch_cpu_1.6",
    "metadata": {
    "cpu": 1,
    "instance": 1,
    "memory": 1000
  }
}

For more information about how to use the EASCMD client to deploy model services, see Deploy model services by using EASCMD or DSW.

You can also use the console to deploy PyTorch model services. For more information, see Model service deployment by using the PAI console and Machine Learning Designer.

Step 2: Call the model service

Both the input and output of PyTorch model services are protocol buffers but not plaintext. The online debugging feature supports only input and output data in plaintext. Therefore, you cannot use the online debugging feature in the console to call model services.

EAS provides EAS SDKs for different programming languages. Request and response data and the direct connection and fault tolerance mechanisms are encapsulated in EAS SDKs. We recommend that you use EAS SDKs to create and send service requests. The following code block shows a sample inference service request:

#!/usr/bin/env python

from eas_prediction import PredictClient
from eas_prediction import TorchRequest

if __name__ == '__main__':
    client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'pytorch_gpu_wl')
    client.init()

    req = TorchRequest()
    req.add_feed(0, [1, 3, 224, 224], TorchRequest.DT_FLOAT, [1] * 150528)
    # req.add_fetch(0)
    for x in range(0, 10):
        resp = client.predict(req)
        print(resp.get_tensor_shape(0))

For more information about the parameters in the sample request and how to call model services, see SDK for Python.

You can also create custom service requests without using EAS SDKs. For more information, see Request syntax.

Request syntax

Both the input and output of PyTorch processors are protocol buffers. You can use EAS SDKs to send service requests. Service requests are encapsulated in EAS SDKs. You need to only use the functions provided by EAS SDKs to create service requests. You can also create custom service requests based on the following syntax without using EAS SDKs. For more information, see Construct a request for a TensorFlow service.

syntax = "proto3";

package pytorch.eas;
option cc_enable_arenas = true;

enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set
  DT_INVALID = 0;

  // Data types that all computation devices are expected to be
  // capable to support
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
}

// Dimensions of an array
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}

// Protocol buffer representing an array
message ArrayProto {
  // Data Type
  ArrayDataType dtype = 1;

  // Shape of the array.
  ArrayShape array_shape = 2;

  // DT_FLOAT
  repeated float float_val = 3 [packed = true];

  // DT_DOUBLE
  repeated double double_val = 4 [packed = true];

  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];

  // DT_STRING
  repeated bytes string_val = 6;

  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];

}


message PredictRequest {

  // Input tensors.
  repeated ArrayProto inputs = 1;

  // Output filter.
  repeated int32 output_filter = 2;
}

// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  repeated ArrayProto outputs = 1;
}