The built-in TorchEasyRec Processor in EAS lets you deploy recommendation models trained with TorchEasyRec or PyTorch as a scoring service with integrated feature engineering. By jointly optimizing feature engineering and the PyTorch model, the processor provides a high-performance scoring service. This topic describes how to deploy and invoke a TorchEasyRec model service.
Background
The following diagram shows the architecture of a recommendation engine based on TorchEasyRec Processor:
The TorchEasyRec Processor includes the following modules:
Item Feature Cache: Caches item features from FeatureStore in memory to reduce network overhead, lower the load on FeatureStore, and improve inference service performance. If these item features include real-time features, FeatureStore handles their synchronization.
feature generator (FG): Processes feature transformations based on a configuration file. A single C++ codebase ensures consistent logic for both offline and online processing.
TorchModel: A PyTorch model, trained with TorchEasyRec or PyTorch and exported as a ScriptedModel.
Limitations
This feature supports only general-purpose instance families g6, g7, and g8, and GPU models such as T4 and A10. For more information, see general-purpose instance families (g series). If you deploy a GPU service, ensure that the CUDA Driver version is 535 or later.
Version history
The TorchEasyRec Processor is in active development. We recommend using the latest version to deploy your inference service, as newer versions provide more features and higher performance. A list of published versions is provided below:
Processor name | Release date | Torch version | FG version | Updates |
easyrec-torch-0.1 | 2024-09-10 | 2.4 | 0.2.9 |
|
easyrec-torch-0.2 | 2024-09-30 | 2.4 | 0.2.9 |
|
easyrec-torch-0.3 | 2024-10-14 | 2.4 | 0.2.9 |
|
easyrec-torch-0.4 | 2024-10-28 | 2.4 | 0.3.1 |
|
easyrec-torch-0.5 | 2024-11-14 | 2.4 | 0.3.1 |
|
easyrec-torch-0.6 | 2024-11-18 | 2.4 | 0.3.6 |
|
easyrec-torch-0.7 | 2024-12-06 | 2.5 | 0.3.9 |
|
easyrec-torch-0.8 | 2024-12-25 | 2.5 | 0.3.9 |
|
easyrec-torch-0.9 | 2025-01-15 | 2.5 | 0.4.1 |
|
easyrec-torch-1.0 | 2025-02-06 | 2.5 | 0.4.2 |
|
easyrec-torch-1.1 | 2025-04-23 | 2.5 | 0.5.9 |
|
easyrec-torch-1.2 | 2025-05-12 | 2.5 | 0.6.0 |
|
easyrec-torch-1.3 | 2025-05-29 | 2.5 | 0.6.5 |
|
easyrec-torch-1.4 | 2025-07-15 | 2.5 | 0.6.9 |
|
easyrec-torch-1.5 | 2025-09-18 | 2.5 | 0.7.3 |
|
easyrec-torch-1.6 | 2025-10-21 | 2.5 | 0.7.4 |
|
easyrec-torch-1.7 | 2025-11-04 | 2.5 | 0.7.4 |
|
easyrec-torch-1.8 | 2025-12-01 | 2.5 | 0.7.4 |
|
easyrec-torch-1.9 | 2026-01-09 | 2.5 | 1.0.0 |
|
easyrec-torch-1.10 | 2026-01-23 | 2.5 | 1.0.1 |
|
easyrec-torch-1.11 | 2026-02-10 | 2.5 | 1.0.1 |
|
easyrec-torch-1.12 | 2026-03-13 | 2.5 | 1.0.1 |
|
Notes on version 2.0 and later
| ||||
easyrec-torch-2.0 | 2026-03-17 | 2.8 | 1.0.1 |
|
easyrec-torch-2.1 | 2026-04-09 | 2.8 | 1.0.2 |
|
easyrec-torch-2.2 | 2026-04-29 | 2.8 | 1.0.5 |
|
Step 1: Deploy a service
Prepare the service configuration file
torcheasyrec.json.Set the processor type to easyrec-torch-{version}. For {version}, select a version from the version list. The following sections show example JSON configurations:
Example with FG (fg_mode='normal')
{ "metadata": { "instance": 1, "name": "alirec_rank_with_fg", "rpc": { "enable_jemalloc": 1, "max_queue_size": 256, "worker_threads": 16 } }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c16g1.4xlarge" } }, "model_config": { "fg_mode": "normal", "fg_threads": 8, "region": "YOUR_REGION", "fs_project": "YOUR_FS_PROJECT", "fs_model": "YOUR_FS_MODEL", "fs_entity": "item", "load_feature_from_offlinestore": true, "access_key_id":"YOUR_ACCESS_KEY_ID", "access_key_secret":"YOUR_ACCESS_KEY_SECRET" }, "storage": [ { "mount_path": "/home/admin/docker_ml/workspace/model/", "oss": { "path": "oss://xxx/xxx/export", "readOnly": false }, "properties": { "resource_type": "code" } } ], "processor":"easyrec-torch-1.12" }Example without FG (fg_mode='bypass')
{ "metadata": { "instance": 1, "name": "alirec_rank_no_fg", "rpc": { "enable_jemalloc": 1, "max_queue_size": 256, "worker_threads": 16 } }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c16g1.4xlarge" } }, "model_config": { "fg_mode": "bypass" }, "storage": [ { "mount_path": "/home/admin/docker_ml/workspace/model/", "oss": { "path": "oss://xxx/xxx/export", "readOnly": false }, "properties": { "resource_type": "code" } } ], "processor":"easyrec-torch-1.12" }The following table describes the key parameters. For other parameters, see JSON Deployment.
Parameter
Required
Description
Example
processor
Yes
The TorchEasyRec processor.
"processor":"easyrec-torch-1.12"
path
Yes
The OSS path mounted to the service to store model files.
"path": "oss://examplebucket/xxx/export"
fg_mode
No
Specifies the feature engineering (FG) mode. Valid values:
bypass(default): Disables FG. Only the Torch model is deployed.This mode is suitable for custom feature processing scenarios.
In this mode, you do not need to configure parameters for the processor to access FeatureStore.
normal: Enables FG. This mode is typically used with TorchEasyRec for model training.
"fg_mode": "normal"
fg_threads
No
The number of concurrent FG threads per request.
"fg_threads": 15
outputs
No
The names of output variables from the Torch model, such as
probs_ctr. Separate multiple names with a comma (,). If unspecified, all variables are returned."outputs":"probs_ctr,probs_cvr"
item_empty_score
No
The default score to return when an item ID does not exist. Default value: 0.
"item_empty_score": -1
Processor recall parameters
faiss_neigh_num
No
The number of items to retrieve for FAISS vector recall. The value is taken from the
faiss_neigh_numfield in the request. If this field is not provided, the value offaiss_neigh_numin themodel_configconfiguration is used, which defaults to 1."faiss_neigh_num": 200
faiss_nprobe
No
The
nprobeparameter specifies the number of clusters to search during the retrieval process. The default value is 800. In FAISS, an inverted file index divides data into multiple small clusters and maintains an inverted list for each cluster. A largernprobevalue usually results in higher search accuracy but increases computational cost and search time. Conversely, a smaller value reduces accuracy but speeds up the search."faiss_nprobe" : 700
Processor parameters for FeatureStore access
fs_project
No
The name of the FeatureStore project. Required when using FeatureStore. For more information, see Configure a FeatureStore project.
"fs_project": "fs_demo"
fs_model
No
The name of the feature model in FeatureStore.
"fs_model": "fs_rank_v1"
fs_entity
No
The name of the entity in FeatureStore.
"fs_entity": "item"
region
No
The region where the FeatureStore product is located. For example, specify
cn-beijingfor the China (Beijing) region. For more information about regions, see Endpoints."region": "cn-beijing"
access_key_id
No
The AccessKey ID for accessing FeatureStore.
"access_key_id": "xxxxx"
access_key_secret
No
The AccessKey Secret for accessing FeatureStore.
"access_key_secret": "xxxxx"
load_feature_from_offlinestore
No
Specifies whether to load offline features directly from the FeatureStore OfflineStore. Valid values:
true: Loads data from the FeatureStore OfflineStore.false(default): Loads data from the FeatureStore OnlineStore.
"load_feature_from_offlinestore": True
featuredb_username
No
The username for FeatureDB.
"featuredb_username":"xxx"
featuredb_password
No
The password for FeatureDB.
"featuredb_password":"xxx"
input_tile: Automatic feature expansion
INPUT_TILE
No
Enables automatic feature expansion. For features that have the same value across all items in a single request, such as
user_id, you can send the value only once. This reduces the request size, network latency, and computation time.This feature requires
normalmode and a model exported from TorchEasyRec with the corresponding environment variable set. By default, the system reads theINPUT_TILEvalue from the model_acc.json file in the exported model directory. If this file does not exist, the value is read from the environment variable.When enabled:
If set to
2: FG for user-side features is calculated only once.If set to
3: FG for user-side features is calculated only once. The system calculates embeddings for user and item features separately, and the user-side embedding is also calculated only once. This is suitable for scenarios with a large number of user-side features.
"processor_envs":
[
{
"name": "INPUT_TILE",
"value": "2"
}
]
NO_GRAD_GUARD
No
Disables gradient calculation during inference. This stops operation tracking and prevents the construction of a computation graph.
NoteSetting this to
1might cause incompatibility issues with some models. If the service hangs during a second inference run, you can resolve the issue by setting thePYTORCH_TENSOREXPR_FALLBACK=2environment variable. This bypasses the compilation step while retaining some graph optimization functionality."processor_envs":
[
{
"name": "NO_GRAD_GUARD",
"value": "1"
}
]
Model warm-up parameters
warmup_data_path
No
Enables the model warm-up feature and specifies the path to save warm-up files. To prevent these files from being lost, you must mount an OSS path to this location in the
storageconfiguration."warmup_data_path": "/warmup"
warmup_cnt_per_file
No
The number of warm-up iterations to run for each warm-up Protobuf (PB) file. A larger value ensures a more thorough warm-up but increases the warm-up time. Default value: 20.
"warmup_cnt_per_file": 20,
warmup_pb_files_count
No
The number of online requests to save as PB files for warm-up at the next service startup. The files are saved to the path specified by
warmup_data_path. Default value: 64."warmup_pb_files_count": 64
Slow request logging and saving
long_request_threshold
No
The time threshold in milliseconds (ms) for identifying a slow request. If a request's processing time exceeds this threshold, the system automatically records the execution time of each stage in the logs. Default value: 200.
"long_request_threshold": 200
save_long_request
No
Specifies whether to save requests that exceed the
long_request_thresholdas PB files. The default value isfalse. The PB files are saved to thetorch_reqfolder under the model directory."save_long_request": true
Request and feature logging to OSS
request_log_path
No
The disk path where the PB files are saved. In the model service configuration, you must mount an OSS path to this location.
"request_log_path": "/online_log_pb"
background_feature_thread_num
No
The number of background threads for saving files to disk. If the disk-writing workload is heavy, you can increase this value to improve the throughput for saving PB files. Default value: 4.
"background_feature_thread_num": 8
Deploy the TorchEasyRec model service. You can use one of the following methods:
JSON Deployment (Recommended)
Follow these steps:
-
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Custom Model Deployment section, click JSON Deployment.
In the JSON editor, paste your JSON configuration and click Deploy.
eascmd client
Download and authenticate the client. This example uses the 64-bit Windows version of the client.
Open a terminal. In the directory containing the JSON file, run the following command to create the service. For more command information, see Command reference.
eascmdwin64.exe create <service.json>Replace
<service.json>with your JSON file name, such astorcheasyrec.json.
-
Step 2: Call service
After deploying the TorchEasyRec model service, follow these steps to view the service call information:
Log in to the PAI console, select the region at the top of the page and the workspace on the right, then click Enter EAS.
In the Service Type column, click Invocation Information to view the service endpoint and token.

The TorchEasyRec model service uses Protobuf for input and output. The calling method depends on whether FG is enabled:
Use FG (fg_mode='normal')
The service supports the following two methods:
EAS Java SDK
Before you run the code, configure your Maven environment. For details, see Java SDK usage instructions. For the latest version of the EAS Java SDK, see https://github.com/pai-eas/eas-java-sdk. The following code shows how to send a request to the alirec_rank_with_fg service:
package com.aliyun.openservices.eas.predict;
import com.aliyun.openservices.eas.predict.http.Compressor;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.proto.TorchRecPredictProtos;
import com.aliyun.openservices.eas.predict.request.TorchRecRequest;
import com.aliyun.openservices.eas.predict.proto.TorchPredictProtos.ArrayProto;
import java.util.*;
public class TorchRecPredictTest {
public static PredictClient InitClient() {
return new PredictClient(new HttpConfig());
}
public static TorchRecRequest buildPredictRequest() {
TorchRecRequest TorchRecRequest = new TorchRecRequest();
TorchRecRequest.appendItemId("7033");
TorchRecRequest.addUserFeature("user_id", 33981,"int");
ArrayList<Double> list = new ArrayList<>();
list.add(0.24689289764507472);
list.add(0.005758482924454689);
list.add(0.6765301324940026);
list.add(0.18137273055602343);
TorchRecRequest.addUserFeature("raw_3", list,"List<double>");
Map<String,Integer> myMap =new LinkedHashMap<>();
myMap.put("866", 4143);
myMap.put("1627", 2451);
TorchRecRequest.addUserFeature("map_1", myMap,"map<string,int>");
ArrayList<ArrayList<Float>> list2 = new ArrayList<>();
ArrayList<Float> innerList1 = new ArrayList<>();
innerList1.add(1.1f);
innerList1.add(2.2f);
innerList1.add(3.3f);
list2.add(innerList1);
ArrayList<Float> innerList2 = new ArrayList<>();
innerList2.add(4.4f);
innerList2.add(5.5f);
list2.add(innerList2);
TorchRecRequest.addUserFeature("click", list2,"list<list<float>>");
TorchRecRequest.addContextFeature("id_2", list,"List<double>");
TorchRecRequest.addContextFeature("id_2", list,"List<double>");
System.out.println(TorchRecRequest.request);
return TorchRecRequest;
}
public static void main(String[] args) throws Exception{
PredictClient client = InitClient();
client.setToken("tokenGeneratedFromService");
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
client.setModelName("alirec_rank_with_fg");
client.setRequestTimeout(100000);
testInvoke(client);
testDebugLevel(client);
client.shutdown();
}
public static void testInvoke(PredictClient client) throws Exception {
long startTime = System.currentTimeMillis();
TorchRecPredictProtos.PBResponse response = client.predict(buildPredictRequest());
for (Map.Entry<String, ArrayProto> entry : response.getMapOutputsMap().entrySet()) {
System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
}
long endTime = System.currentTimeMillis();
System.out.println("Spend Time: " + (endTime - startTime) + "ms");
}
public static void testDebugLevel(PredictClient client) throws Exception {
long startTime = System.currentTimeMillis();
TorchRecRequest request = buildPredictRequest();
request.setDebugLevel(1);
TorchRecPredictProtos.PBResponse response = client.predict(request);
Map<String, String> genFeas = response.getGenerateFeaturesMap();
for(String itemId: genFeas.keySet()) {
System.out.println(itemId);
System.out.println(genFeas.get(itemId));
}
long endTime = System.currentTimeMillis();
System.out.println("Spend Time: " + (endTime - startTime) + "ms");
}
}
The key parameters are as follows:
client.setToken("tokenGeneratedFromService"): Replace the value with your service token. For example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the value with your service endpoint. For example,
175805416243****.cn-beijing.pai-eas.aliyuncs.com.client.setModelName("alirec_rank_with_fg"): Replace the value with your service name.
EAS Python SDK
Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see Python SDK usage instructions. For the source code, see https://github.com/pai-eas/eas-python-sdk/blob/master/eas_prediction/torchrec_request.py. The following code is an example:
from eas_prediction import PredictClient
from eas_prediction.torchrec_request import TorchRecRequest
if __name__ == '__main__':
endpoint = 'http://localhost:6016'
client = PredictClient(endpoint, '<YOUR_SERVICE_NAME>')
client.set_token('<your_service_token>')
client.init()
torchrec_req = TorchRecRequest()
torchrec_req.add_user_fea('user_id', 'u001d', "STRING")
torchrec_req.add_user_fea('age', 12, "INT")
torchrec_req.add_user_fea('weight', 129.8, "FLOAT")
torchrec_req.add_item_id('item_0001')
torchrec_req.add_item_id('item_0002')
torchrec_req.add_item_id('item_0003')
torchrec_req.add_user_fea("raw_3", [0.24689289764507472, 0.005758482924454689, 0.6765301324940026, 0.18137273055602343], "list<double>")
torchrec_req.add_user_fea("raw_4", [0.9965264740966043, 0.659596586238391, 0.16396649403055896, 0.08364986620265635], "list<double>")
torchrec_req.add_user_fea("map_1", {"0":0.37845234405201145}, "map<int,float>")
torchrec_req.add_user_fea("map_2", {"866":4143,"1627":2451}, "map<int,int>")
torchrec_req.add_context_fea("id_2", [866], "list<int>" )
torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
torchrec_req.add_user_fea("click", [[0.94433516,0.49145547], [0.94433516, 0.49145597]], "list<list<float>>")
res = client.predict(torchrec_req)
print(res)
The key parameters are as follows:
endpoint: Set this parameter to your service endpoint. For example,
http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.<your_service_name>: Replace this placeholder with your service name.
<your_service_token>: Replace this placeholder with your service token. For example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
Bypassing FG (fg_mode='bypass')
EAS Java SDK
Before you run the code, configure your Maven environment. For details, see Java SDK Usage Instructions. For the latest SDK version, see the project on GitHub. The following sample code sends a request to the alirec_rank_no_fg service:
package com.aliyun.openservices.eas.predict;
import java.util.List;
import java.util.Arrays;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.request.TorchDataType;
import com.aliyun.openservices.eas.predict.request.TorchRequest;
import com.aliyun.openservices.eas.predict.response.TorchResponse;
public class Test_Torch {
public static PredictClient InitClient() {
return new PredictClient(new HttpConfig());
}
public static TorchRequest buildPredictRequest() {
TorchRequest request = new TorchRequest();
float[] content = new float[2304000];
for (int i = 0; i < content.length; i++) {
content[i] = (float) 0.0;
}
long[] content_i = new long[900];
for (int i = 0; i < content_i.length; i++) {
content_i[i] = 0;
}
long[] a = Arrays.copyOfRange(content_i, 0, 300);
float[] b = Arrays.copyOfRange(content, 0, 230400);
request.addFeed(0, TorchDataType.DT_INT64, new long[]{300,3}, content_i);
request.addFeed(1, TorchDataType.DT_FLOAT, new long[]{300,10,768}, content);
request.addFeed(2, TorchDataType.DT_FLOAT, new long[]{300,768}, b);
request.addFeed(3, TorchDataType.DT_INT64, new long[]{300}, a);
request.addFetch(0);
request.setDebugLevel(903);
return request;
}
public static void main(String[] args) throws Exception {
PredictClient client = InitClient();
client.setToken("tokenGeneratedFromService");
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
client.setModelName("alirec_rank_no_fg");
client.setIsCompressed(false);
long startTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
TorchResponse response = null;
try {
response = client.predict(buildPredictRequest());
List<Float> result = response.getFloatVals(0);
System.out.print("Predict Result: [");
for (int j = 0; j < result.size(); j++) {
System.out.print(result.get(j).floatValue());
if (j != result.size() - 1) {
System.out.print(", ");
}
}
System.out.print("]\n");
} catch (Exception e) {
e.printStackTrace();
}
}
long endTime = System.currentTimeMillis();
System.out.println("Spend Time: " + (endTime - startTime) + "ms");
client.shutdown();
}
}The key parameters are as follows:
client.setToken("tokenGeneratedFromService"): Replace the value in parentheses with your service token. For example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the value in parentheses with your service endpoint. For example,
175805416243****.cn-beijing.pai-eas.aliyuncs.com.client.setModelName("alirec_rank_no_fg"): Replace the value in parentheses with your service name.
EAS Python SDK
Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more configuration details, see Python SDK Instructions. The sample code for requesting the alirec_rank_no_fg service is as follows:
from eas_prediction import PredictClient
from eas_prediction import TorchRequest
# snappy data
req = TorchRequest(False)
req.add_feed(0, [300, 3], TorchRequest.DT_INT64, [1] * 900)
req.add_feed(1, [300, 10, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 768000)
req.add_feed(2, [300, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 76800)
req.add_feed(3, [300], TorchRequest.DT_INT64, [1] * 300)
client = PredictClient('<your_endpoint>', '<your_service_name>')
client.set_token('<your_service_token>')
client.init()
resp = client.predict(req)
print(resp)
The key parameters are as follows:
<your_endpoint>: Replace this placeholder with your service endpoint. For example,
http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.<your_service_name>: Replace this placeholder with your service name.
<your_service_token>: Replace this placeholder with your service token. For example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
For details on the status codes that the service returns, see Service Status Codes. To construct a service request, see the Request Format.
Request format
To call the service, you can manually generate the prediction request code from the .proto file. Alternatively, if you want to build a service request yourself, you can use the following Protobuf definition to generate the corresponding code:
Pytorch_predict.proto: Torch model request
syntax = "proto3";
package pytorch.eas;
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchPredictProtos";
enum ArrayDataType {
// Not a legal value for DataType. Used to indicate a DataType field
// has not been set.
DT_INVALID = 0;
// Data types that all computation devices are expected to support.
DT_FLOAT = 1;
DT_DOUBLE = 2;
DT_INT32 = 3;
DT_UINT8 = 4;
DT_INT16 = 5;
DT_INT8 = 6;
DT_STRING = 7;
DT_COMPLEX64 = 8; // Single-precision complex
DT_INT64 = 9;
DT_BOOL = 10;
DT_QINT8 = 11; // Quantized int8
DT_QUINT8 = 12; // Quantized uint8
DT_QINT32 = 13; // Quantized int32
DT_BFLOAT16 = 14; // Float32 truncated to 16 bits. Only for cast ops.
DT_QINT16 = 15; // Quantized int16
DT_QUINT16 = 16; // Quantized uint16
DT_UINT16 = 17;
DT_COMPLEX128 = 18; // Double-precision complex
DT_HALF = 19;
DT_RESOURCE = 20;
DT_VARIANT = 21; // Arbitrary C++ data types
}
// Dimensions of an array.
message ArrayShape {
repeated int64 dim = 1 [packed = true];
}
// Protocol buffer representing an array.
message ArrayProto {
// Data type.
ArrayDataType dtype = 1;
// Shape of the array.
ArrayShape array_shape = 2;
// DT_FLOAT.
repeated float float_val = 3 [packed = true];
// DT_DOUBLE.
repeated double double_val = 4 [packed = true];
// DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
repeated int32 int_val = 5 [packed = true];
// DT_STRING.
repeated bytes string_val = 6;
// DT_INT64.
repeated int64 int64_val = 7 [packed = true];
}
message PredictRequest {
// Input tensors.
repeated ArrayProto inputs = 1;
// Output filter.
repeated int32 output_filter = 2;
// Input tensors for recommendation.
map<string, ArrayProto> map_inputs = 3;
// Debug level for recommendation.
int32 debug_level = 100;
}
// Response to a successful PredictRequest.
message PredictResponse {
// Output tensors.
repeated ArrayProto outputs = 1;
// Output tensors for recommendation.
map<string, ArrayProto> map_outputs = 2;
}
Torchrec_predict.proto: Torch model with FG request
syntax = "proto3";
option go_package = ".;torch_predict_protos";
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchRecPredictProtos";
package com.alibaba.pairec.processor;
import "pytorch_predict.proto";
// Mappings from int64 to other types.
message LongStringMap {
map<int64, string> map_field = 1;
}
message LongIntMap {
map<int64, int32> map_field = 1;
}
message LongLongMap {
map<int64, int64> map_field = 1;
}
message LongFloatMap {
map<int64, float> map_field = 1;
}
message LongDoubleMap {
map<int64, double> map_field = 1;
}
// Mappings from string to other types.
message StringStringMap {
map<string, string> map_field = 1;
}
message StringIntMap {
map<string, int32> map_field = 1;
}
message StringLongMap {
map<string, int64> map_field = 1;
}
message StringFloatMap {
map<string, float> map_field = 1;
}
message StringDoubleMap {
map<string, double> map_field = 1;
}
// Mappings from int32 to other types.
message IntStringMap {
map<int32, string> map_field = 1;
}
message IntIntMap {
map<int32, int32> map_field = 1;
}
message IntLongMap {
map<int32, int64> map_field = 1;
}
message IntFloatMap {
map<int32, float> map_field = 1;
}
message IntDoubleMap {
map<int32, double> map_field = 1;
}
// Single-level list types.
message IntList {
repeated int32 features = 1;
}
message LongList {
repeated int64 features = 1;
}
message FloatList {
repeated float features = 1;
}
message DoubleList {
repeated double features = 1;
}
message StringList {
repeated string features = 1;
}
// Nested list types.
message IntLists {
repeated IntList lists = 1;
}
message LongLists {
repeated LongList lists = 1;
}
message FloatLists {
repeated FloatList lists = 1;
}
message DoubleLists {
repeated DoubleList lists = 1;
}
message StringLists {
repeated StringList lists = 1;
}
message PBFeature {
oneof value {
int32 int_feature = 1;
int64 long_feature = 2;
string string_feature = 3;
float float_feature = 4;
double double_feature=5;
LongStringMap long_string_map = 6;
LongIntMap long_int_map = 7;
LongLongMap long_long_map = 8;
LongFloatMap long_float_map = 9;
LongDoubleMap long_double_map = 10;
StringStringMap string_string_map = 11;
StringIntMap string_int_map = 12;
StringLongMap string_long_map = 13;
StringFloatMap string_float_map = 14;
StringDoubleMap string_double_map = 15;
IntStringMap int_string_map = 16;
IntIntMap int_int_map = 17;
IntLongMap int_long_map = 18;
IntFloatMap int_float_map = 19;
IntDoubleMap int_double_map = 20;
IntList int_list = 21;
LongList long_list =22;
StringList string_list = 23;
FloatList float_list = 24;
DoubleList double_list = 25;
IntLists int_lists = 26;
LongLists long_lists =27;
StringLists string_lists = 28;
FloatLists float_lists = 29;
DoubleLists double_lists = 30;
}
}
// Context features.
message ContextFeatures {
repeated PBFeature features = 1;
}
// PBRequest specifies the request for the aggregator.
message PBRequest {
// Debug mode.
int32 debug_level = 1;
// User features, keyed by the user input name.
map<string, PBFeature> user_features = 2;
// Item IDs.
repeated string item_ids = 3;
// Context features for each item, keyed by the context input name.
map<string, ContextFeatures> context_features = 4;
// Number of nearest neighbors (items) to retrieve from Faiss.
int32 faiss_neigh_num = 5;
// Item features for each item, keyed by the item input name.
map<string, ContextFeatures> item_features = 6;
// Optional metadata.
map<string, string> meta_data = 7;
}
// PBResponse specifies the response from the aggregator.
message PBResponse {
// Output tensors from the Torch model.
map<string, pytorch.eas.ArrayProto> map_outputs = 1;
// Output features generated by FG.
map<string, string> generate_features = 2;
// All input features processed by FG.
map<string, string> raw_features = 3;
// Item IDs.
repeated string item_ids = 4;
}
The debug_level parameter is as follows:
This parameter is optional and for debugging only.
Debug_level | Description |
0 | Performs a standard prediction. |
1 | In normal mode, validates request keys, performs shape validation on FG inputs and outputs, and returns the input and output features without performing a prediction. |
2 | In normal mode, validates request keys, performs shape validation on FG inputs and outputs, returns the input features, output features, and the model input tensor, and performs a prediction. |
3 | In normal mode, validates request keys, performs shape validation on FG inputs and outputs, and returns the output features without performing a prediction. |
100 | In normal mode, persists the prediction request to disk as a Protobuf file. The |
102 | In normal mode, performs vector recall: validates request keys, performs shape validation on FG inputs and outputs, and saves the input features, output features, the model input tensor, and the user embedding. |
903 | Logs the prediction time for each stage. |
904 | Validates the request and logs any missing feature fields. |
Service status codes
A TorchEasyRec service returns the following status codes. For more information about status codes returned by an EAS service, see Appendix: service status codes and common errors.
Status code | Description |
200 | The request was successful. |
400 | The request is invalid. |
500 | Prediction failed. Check the service log for details. |
Save and parse a Protobuf request
For processor version 1.12 and later, when you enable debug mode by setting debug=True in the PAI-REC engine request body, the processor saves the original request, item-side input features, and transformed item-side features to a protobuf file on disk for feature analysis and validation. To use this feature, set the request_log_path parameter to a destination path mounted via OSS. For example:
"model_config": {
"fg_mode": "normal",
"fg_threads": 8,
"request_log_path": "/request_log",
"background_feature_thread_num": 8
},
"storage": [
{
"mount_path": "/request_log",
"OSS": {
"path": "oss://my-bucket/my-model/myrequests/",
"readOnly": false
}
},
{
"mount_path": "/home/admin/docker_ml/workspace/model/",
"OSS": {
"path": "oss://my-bucket/my-model/20260316",
"readOnly": false
}
}
]The processor creates a date_hour subdirectory in the path specified by request_log_path and stores the request data. Background threads write this data to disk asynchronously. The number of background threads is set by the model_config.background_feature_thread_num parameter, which defaults to 4. Increasing this value can improve write throughput. The Protobuf files written to disk are named <request_id>_<random_str>.pb. Because OSS has limited write bandwidth, avoid sending excessive request traffic to the PAI-REC engine when debug mode is enabled. If writes fall behind, the model service's internal queue drops new requests.
To parse the generated protobuf file, use EAS-Python-SDK version 0.35 or later, or EAS-Java-SDK version 2.0.29 or later. The following is a Python example:
from eas_prediction.torchrec_predict_pb2 import PBLogData
with open('xxxx.pb', 'rb') as f:
pb_data = f.read()
pb_log = PBLogData()
pb_log.ParseFromString(pb_data)
print(pb_log) # Print the entire log
print(pb_log.request) # Print the request
print(pb_log.raw_features) # Print raw item-side features
print(pb_log.generate_features) # Print generated item-side featuresModel service warm-up
A model service may exhibit initial response time spikes during startup or updates due to software and hardware characteristics. To prevent these spikes, configure the warm-up feature for the processor. To enable this feature in easyrec-torch-1.5 and later versions, add three parameters to model_config. For example:
"warmup_data_path": "/warmup", # Enables warmup and sets the directory for warmup files.
"warmup_cnt_per_file": 20, # Number of warmup iterations per file. A higher value results in a more thorough warmup.
"warmup_pb_files_count": 64 # Number of online requests to save as protobuf files for warmup. A higher value covers more data patterns. Files are saved to the `warmup_data_path` directory.To persist the protobuf file, configure an OSS mount in the storage section to point to warmup_data_path. For example:
"storage": [
...,
{
"mount_path": "/warmup",
"oss": {
"path": "oss://<your-warmup-pb-file-path>",
"readOnly": false
}
}
]
On its first start after configuration, the processor captures and saves the number of live requests specified by warmup_pb_files_count. On subsequent restarts, it uses these saved protobuf files for warm-up.