All Products
Search
Document Center

Platform For AI:PyTorch processor

Last Updated:Mar 05, 2026

Elastic Algorithm Service (EAS) provides a built-in PyTorch processor for deploying standard PyTorch models in TorchScript format as online services. This topic describes how to deploy and call PyTorch model services.

PyTorch processor version guide

PyTorch supports multiple versions, including GPU and CPU versions. The following table lists the processor names for each version.

Processor name

PyTorch version

Supports GPU version

pytorch_cpu_1.6

Pytorch 1.6

No

pytorch_cpu_1.7

Pytorch 1.7

No

pytorch_cpu_1.9

Pytorch 1.9

No

pytorch_cpu_1.10

Pytorch 1.10

No

pytorch_gpu_1.6

Pytorch 1.6

Yes

pytorch_gpu_1.7

Pytorch 1.7

Yes

pytorch_gpu_1.9

Pytorch 1.9

Yes

pytorch_gpu_1.10

Pytorch 1.10

Yes

Step 1: Deploy a service

When you use the eascmd client to deploy a PyTorch model service, set the processor parameter to one of the PyTorch processor names from the table above. The following code provides a sample service configuration file.

{

  "name": "pytorch_resnet_example",
  "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/resnet18.pt",
  "processor": "pytorch_cpu_1.6",
    "metadata": {
    "cpu": 1,
    "instance": 1,
    "memory": 1000
  }
}

For more information about how to use the client tool to deploy services, see Service deployment: EASCMD & DSW.

You can also deploy PyTorch model services using the console. For more information, see Service deployment: Console.

Step 2: Call the service

PyTorch services use the Protocol Buffers (Protobuf) format for input and output, not plain text. The online debugging feature supports only plain text data. Therefore, you cannot use the online debugging feature in the console.

EAS provides software development kits (SDKs) for different programming languages. The SDKs encapsulate request and response data and include mechanisms for direct connections and fault tolerance. You can use an SDK to build and send requests. The following code provides a sample inference request.

#!/usr/bin/env python

from eas_prediction import PredictClient
from eas_prediction import TorchRequest

if __name__ == '__main__':
    client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'pytorch_gpu_wl')
    client.init()

    req = TorchRequest()
    req.add_feed(0, [1, 3, 224, 224], TorchRequest.DT_FLOAT, [1] * 150528)
    # req.add_fetch(0)
    for x in range(0, 10):
        resp = client.predict(req)
        print(resp.get_tensor_shape(0))

For more information about the parameter settings and invocation methods in the code, see Use the Python SDK.

You can also build your own service requests. For more information, see Request format.

Request format

The PyTorch processor uses the Protobuf format for input and output. When you use an SDK to send a request, the request is already encapsulated. You can simply use the functions provided by the SDK to build it. If you want to build your own service requests, you can generate the required code from the following .proto definition. For more information, see Construct a request for a TensorFlow service.

syntax = "proto3";

package pytorch.eas;
option cc_enable_arenas = true;

enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set
  DT_INVALID = 0;

  // Data types that all computation devices are expected to be
  // capable to support
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
}

// Dimensions of an array
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}

// Protocol buffer representing an array
message ArrayProto {
  // Data Type
  ArrayDataType dtype = 1;

  // Shape of the array.
  ArrayShape array_shape = 2;

  // DT_FLOAT
  repeated float float_val = 3 [packed = true];

  // DT_DOUBLE
  repeated double double_val = 4 [packed = true];

  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];

  // DT_STRING
  repeated bytes string_val = 6;

  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];

}


message PredictRequest {

  // Input tensors.
  repeated ArrayProto inputs = 1;

  // Output filter.
  repeated int32 output_filter = 2;
}

// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  repeated ArrayProto outputs = 1;
}