The built-in TorchEasyRec processor in Elastic Algorithm Service (EAS) deploys recommendation models trained with TorchEasyRec or PyTorch as scoring services. It integrates feature engineering capabilities to deliver high-performance scoring by jointly optimizing feature engineering and the PyTorch model.
Architecture
The following figure shows the recommendation engine architecture based on the TorchEasyRec processor.
The TorchEasyRec processor contains the following modules:
-
Item Feature Cache: Caches item-side features from FeatureStore in memory to reduce network overhead and request pressure on FeatureStore. This improves inference service performance. When item-side features include real-time features, FeatureStore handles synchronization.
-
Feature Generator (FG): Generates features based on a configuration file. A single set of C++ code ensures consistent logic for offline and online feature processing.
-
TorchModel: A PyTorch model exported as a ScriptedModel after training with TorchEasyRec or PyTorch.
Limitations
Only general-purpose instance families g6, g7, or g8 are supported. GPU models such as T4 and A10 are also supported. See general-purpose instance families (g series). For GPU services, ensure CUDA Driver version is 535 or later.
Version history
The TorchEasyRec processor is under active development. Use the latest version to deploy inference services. New versions provide additional features and improved performance. Released versions:
|
Processor name |
Release date |
Torch version |
FG version |
New features |
|
easyrec-torch-0.1 |
20240910 |
2.4 |
0.2.9 |
|
|
easyrec-torch-0.2 |
20240930 |
2.4 |
0.2.9 |
|
|
easyrec-torch-0.3 |
20241014 |
2.4 |
0.2.9 |
|
|
easyrec-torch-0.4 |
20241028 |
2.4 |
0.3.1 |
|
|
easyrec-torch-0.5 |
20241114 |
2.4 |
0.3.1 |
|
|
easyrec-torch-0.6 |
20241118 |
2.4 |
0.3.6 |
|
|
easyrec-torch-0.7 |
20241206 |
2.5 |
0.3.9 |
|
|
easyrec-torch-0.8 |
20241225 |
2.5 |
0.3.9 |
|
|
easyrec-torch-0.9 |
20250115 |
2.5 |
0.4.1 |
|
|
easyrec-torch-1.0 |
20250206 |
2.5 |
0.4.2 |
|
|
easyrec-torch-1.1 |
20250423 |
2.5 |
0.5.9 |
|
|
easyrec-torch-1.2 |
20250512 |
2.5 |
0.6.0 |
|
|
easyrec-torch-1.3 |
20250529 |
2.5 |
0.6.5 |
|
|
easyrec-torch-1.4 |
20250715 |
2.5 |
0.6.9 |
|
|
easyrec-torch-1.5 |
20250918 |
2.5 |
0.7.3 |
|
|
easyrec-torch-1.6 |
20251021 |
2.5 |
0.7.4 |
|
|
easyrec-torch-1.7 |
20251104 |
2.5 |
0.7.4 |
|
|
easyrec-torch-1.8 |
20251201 |
2.5 |
0.7.4 |
|
|
easyrec-torch-1.9 |
20260109 |
2.5 |
1.0.0 |
|
|
easyrec-torch-1.10 |
20260123 |
2.5 |
1.0.1 |
|
|
easyrec-torch-1.11 |
20260210 |
2.5 |
1.0.1 |
|
|
easyrec-torch-1.12 |
20260313 |
2.5 |
1.0.1 |
|
|
Version 2.0 and later requirements: The GLIBC version of the EAS backend base image was upgraded in easyrec-torch-2.0. When deploying version 2.0 or later:
|
||||
|
easyrec-torch-2.0 |
20260317 |
2.8 |
1.0.1 |
|
Deploy a service
-
Prepare the service configuration file torcheasyrec.json.
Set the processor type to easyrec-torch-{version}. For {version}, see Version history. The following examples show JSON configurations:
Example: Using FG (fg_mode='normal')
{ "metadata": { "instance": 1, "name": "alirec_rank_with_fg", "rpc": { "enable_jemalloc": 1, "max_queue_size": 256, "worker_threads": 16 } }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c16g1.4xlarge" } }, "model_config": { "fg_mode": "normal", "fg_threads": 8, "region": "YOUR_REGION", "fs_project": "YOUR_FS_PROJECT", "fs_model": "YOUR_FS_MODEL", "fs_entity": "item", "load_feature_from_offlinestore": true, "access_key_id":"YOUR_ACCESS_KEY_ID", "access_key_secret":"YOUR_ACCESS_KEY_SECRET" }, "storage": [ { "mount_path": "/home/admin/docker_ml/workspace/model/", "oss": { "path": "oss://xxx/xxx/export", "readOnly": false }, "properties": { "resource_type": "code" } } ], "processor":"easyrec-torch-1.12" }Example: Not using FG (fg_mode='bypass')
{ "metadata": { "instance": 1, "name": "alirec_rank_no_fg", "rpc": { "enable_jemalloc": 1, "max_queue_size": 256, "worker_threads": 16 } }, "cloud": { "computing": { "instance_type": "ecs.gn6i-c16g1.4xlarge" } }, "model_config": { "fg_mode": "bypass" }, "storage": [ { "mount_path": "/home/admin/docker_ml/workspace/model/", "oss": { "path": "oss://xxx/xxx/export", "readOnly": false }, "properties": { "resource_type": "code" } } ], "processor":"easyrec-torch-1.12" }The following table describes key parameters. For other parameters, see JSON deployment.
Parameter
Required
Description
Example
processor
Yes
The TorchEasyRec processor.
"processor":"easyrec-torch-1.12"
path
Yes
The OSS path mounted to service storage for storing model files.
"path": "oss://examplebucket/xxx/export"
fg_mode
No
Feature engineering mode. Valid values:
-
bypass (default): Does not use FG. Only the Torch model is deployed.
-
Suitable for scenarios using custom feature processing.
-
In this mode, no need to configure parameters for the processor to access FeatureStore.
-
-
normal: Uses FG. Typically used with TorchEasyRec for model training.
"fg_mode": "normal"
fg_threads
No
Number of concurrent threads for executing FG for a single request.
"fg_threads": 15
outputs
No
Names of output variables predicted by the Torch model, such as probs_ctr. For multiple names, separate with commas (,). By default, all variables are output.
"outputs":"probs_ctr,probs_cvr"
item_empty_score
No
Default score when an item ID does not exist. Default value: 0.
"item_empty_score": -1
Processor recall parameters
faiss_neigh_num
No
Number of vectors to recall using FAISS. By default, this value comes from the
faiss_neigh_numfield in the request body. If not provided, uses thefaiss_neigh_numvalue inmodel_config. Default: 1."faiss_neigh_num": 200
faiss_nprobe
No
The nprobe parameter specifies the number of clusters to retrieve during retrieval. Default: 800. The posting list index in FAISS divides data into multiple small clusters and maintains a posting list for each. A larger
nprobevalue usually increases search precision but also computing cost and search time. A smaller value reduces precision but speeds up search."faiss_nprobe" : 700
Parameters for the processor to access FeatureStore
fs_project
No
FeatureStore project name. Required when using FeatureStore. See Configure a FeatureStore project.
"fs_project": "fs_demo"
fs_model
No
Model feature name in FeatureStore.
"fs_model": "fs_rank_v1"
fs_entity
No
Entity name in FeatureStore.
"fs_entity": "item"
region
No
Region where FeatureStore resides. For example, cn-beijing for China (Beijing). See Endpoints.
"region": "cn-beijing"
access_key_id
No
AccessKey ID for FeatureStore.
"access_key_id": "xxxxx"
access_key_secret
No
AccessKey secret for FeatureStore.
"access_key_secret": "xxxxx"
load_feature_from_offlinestore
No
Whether to obtain offline feature data directly from FeatureStore OfflineStore. Valid values:
-
True: Obtain data from FeatureStore OfflineStore.
-
False (default): Obtain data from FeatureStore OnlineStore.
"load_feature_from_offlinestore": True
featuredb_username
No
Username for FeatureDB.
"featuredb_username":"xxx"
featuredb_password
No
Password for FeatureDB.
"featuredb_passwd":"xxx"
input_tile: Parameters for automatic feature expansion
INPUT_TILE
No
Supports automatic feature expansion. For features with the same value in a single request, such as user_id, only need to pass one value. This reduces request size, network transmission time, and computation time.
This feature must be used in normal mode with TorchEasyRec. Set the corresponding environment variable when exporting the model. The system reads the INPUT_TILE value from model_acc.json in the model directory exported from TorchEasyRec. If the file does not exist, the system reads from the environment variable.
After enabling this feature:
-
If the environment variable is set to 2, FG for user-side features is calculated only once.
-
If the environment variable is set to 3, FG for user-side features is calculated only once. The system calculates embeddings for user and item separately, and user-side embedding is calculated only once. Suitable for scenarios with many user-side features.
"processor_envs":
[
{
"name": "INPUT_TILE",
"value": "2"
}
]
NO_GRAD_GUARD
No
Disables gradient calculation during inference. This stops operation tracking and prevents computation graph construction.
NoteWhen set to 1, some models may be incompatible. If you encounter stuttering during the second inference run, add environment variable
PYTORCH_TENSOREXPR_FALLBACK=2to resolve this. This skips the compilation step while retaining some graph optimization features."processor_envs":
[
{
"name": "NO_GRAD_GUARD",
"value": "1"
}
]
Model warm-up parameters
warmup_data_path
No
Enables model warm-up feature and specifies the path to save warm-up files. To ensure warm-up files are not lost, add an OSS mount to this path in storage configuration.
"warmup_data_path": "/warmup"
warmup_cnt_per_file
No
Number of warm-up times for each warm-up protobuf file. Increasing this parameter ensures sufficient warm-up but extends the ramp-up period. Default: 20.
"warmup_cnt_per_file": 20,
warmup_pb_files_count
No
Number of online requests to save. Requests are saved as protobuf files for next startup warm-up. The save path is specified by warmup_data_path parameter. Default: 64.
"warmup_pb_files_count": 64
Slow request logging and saving
long_request_threshold
No
Time threshold for slow requests in ms. For requests exceeding this threshold, running time of each stage is automatically recorded in the log. Default: 200 ms.
"long_request_threshold": 200
save_long_request
No
Boolean parameter. Whether to save the request as protobuf file when a slow request occurs (exceeds long_request_threshold). Default: false.
"save_long_request": true
Writing original requests and item features to OSS storage
request_log_path
No
Path where protobuf files are saved to disk. In model service configuration, use an OSS mount for this path.
"request_log_path": "/online_log_pb"
background_feature_thread_num
No
Number of threads responsible for the backend task of saving to disk. Default: 4. If the disk-saving task is heavy, increase this value to improve protobuf file save speed.
"background_feature_thread_num": 8
-
-
Deploy the TorchEasyRec model service using one of the following methods:
On-premises deployment using JSON (recommended)
Procedure:
-
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
-
On the Elastic Algorithm Service (EAS) page, click Deploy Service, and then in the Custom Model Deployment section, click JSON On-Premises Deployment.
-
In the JSON text box, enter the prepared JSON configuration, and click Deploy.
Deployment using eascmd
-
Download and authenticate the client. This topic uses the Windows 64-bit version as an example.
-
Open a terminal tool. In the directory where the JSON file is located, run the following command to create a service. See Command reference.
eascmdwin64.exe create <service.json>Replace <service.json> with the name of the JSON file you created, such as torcheasyrec.json.
-
Invoke a service
After deploying the TorchEasyRec model service, view service invocation information:
-
Log on to the PAI console, select the region at the top of the page and the workspace on the right, and then click Enter EAS.
-
Click Invocation Information in the Service Method column of the target service to view the service endpoint and token.

The input and output of the TorchEasyRec model service are in protobuf format. The invocation method depends on whether FG is used:
Using FG (fg_mode='normal')
Two invocation methods are supported:
EAS Java SDK
Before running the code, configure the Maven environment. See Use the Java SDK. For the latest Java SDK version, see https://github.com/pai-eas/eas-java-sdk. The following example shows how to request the alirec_rank_with_fg service:
package com.aliyun.openservices.eas.predict;
import com.aliyun.openservices.eas.predict.http.Compressor;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.proto.TorchRecPredictProtos;
import com.aliyun.openservices.eas.predict.request.TorchRecRequest;
import com.aliyun.openservices.eas.predict.proto.TorchPredictProtos.ArrayProto;
import java.util.*;
public class TorchRecPredictTest {
public static PredictClient InitClient() {
return new PredictClient(new HttpConfig());
}
public static TorchRecRequest buildPredictRequest() {
TorchRecRequest TorchRecRequest = new TorchRecRequest();
TorchRecRequest.appendItemId("7033");
TorchRecRequest.addUserFeature("user_id", 33981,"int");
ArrayList<Double> list = new ArrayList<>();
list.add(0.24689289764507472);
list.add(0.005758482924454689);
list.add(0.6765301324940026);
list.add(0.18137273055602343);
TorchRecRequest.addUserFeature("raw_3", list,"List<double>");
Map<String,Integer> myMap =new LinkedHashMap<>();
myMap.put("866", 4143);
myMap.put("1627", 2451);
TorchRecRequest.addUserFeature("map_1", myMap,"map<string,int>");
ArrayList<ArrayList<Float>> list2 = new ArrayList<>();
ArrayList<Float> innerList1 = new ArrayList<>();
innerList1.add(1.1f);
innerList1.add(2.2f);
innerList1.add(3.3f);
list2.add(innerList1);
ArrayList<Float> innerList2 = new ArrayList<>();
innerList2.add(4.4f);
innerList2.add(5.5f);
list2.add(innerList2);
TorchRecRequest.addUserFeature("click", list2,"list<list<float>>");
TorchRecRequest.addContextFeature("id_2", list,"List<double>");
TorchRecRequest.addContextFeature("id_2", list,"List<double>");
System.out.println(TorchRecRequest.request);
return TorchRecRequest;
}
public static void main(String[] args) throws Exception{
PredictClient client = InitClient();
client.setToken("tokenGeneratedFromService");
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
client.setModelName("alirec_rank_with_fg");
client.setRequestTimeout(100000);
testInvoke(client);
testDebugLevel(client);
client.shutdown();
}
public static void testInvoke(PredictClient client) throws Exception {
long startTime = System.currentTimeMillis();
TorchRecPredictProtos.PBResponse response = client.predict(buildPredictRequest());
for (Map.Entry<String, ArrayProto> entry : response.getMapOutputsMap().entrySet()) {
System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
}
long endTime = System.currentTimeMillis();
System.out.println("Spend Time: " + (endTime - startTime) + "ms");
}
public static void testDebugLevel(PredictClient client) throws Exception {
long startTime = System.currentTimeMillis();
TorchRecRequest request = buildPredictRequest();
request.setDebugLevel(1);
TorchRecPredictProtos.PBResponse response = client.predict(request);
Map<String, String> genFeas = response.getGenerateFeaturesMap();
for(String itemId: genFeas.keySet()) {
System.out.println(itemId);
System.out.println(genFeas.get(itemId));
}
long endTime = System.currentTimeMillis();
System.out.println("Spend Time: " + (endTime - startTime) + "ms");
}
}
Where:
-
client.setToken("tokenGeneratedFromService"): Replace the parameter in parentheses with your service token. For example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****. -
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the parameter in parentheses with your service endpoint. For example,
175805416243****.cn-beijing.pai-eas.aliyuncs.com. -
client.setModelName("alirec_rank_with_fg"): Replace the parameter in parentheses with your service name.
Using the EAS Python SDK
Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see Use the Python SDK. The following code provides an example:
from eas_prediction import PredictClient
from eas_prediction.torchrec_request import TorchRecRequest
if __name__ == '__main__':
endpoint = 'http://localhost:6016'
client = PredictClient(endpoint, '<YOUR_SERVICE_NAME>')
client.set_token('<your_service_token>')
client.init()
torchrec_req = TorchRecRequest()
torchrec_req.add_user_fea('user_id', 'u001d', "STRING")
torchrec_req.add_user_fea('age', 12, "INT")
torchrec_req.add_user_fea('weight', 129.8, "FLOAT")
torchrec_req.add_item_id('item_0001')
torchrec_req.add_item_id('item_0002')
torchrec_req.add_item_id('item_0003')
torchrec_req.add_user_fea("raw_3", [0.24689289764507472, 0.005758482924454689, 0.6765301324940026, 0.18137273055602343], "list<double>")
torchrec_req.add_user_fea("raw_4", [0.9965264740966043, 0.659596586238391, 0.16396649403055896, 0.08364986620265635], "list<double>")
torchrec_req.add_user_fea("map_1", {"0":0.37845234405201145}, "map<int,float>")
torchrec_req.add_user_fea("map_2", {"866":4143,"1627":2451}, "map<int,int>")
torchrec_req.add_context_fea("id_2", [866], "list<int>" )
torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
torchrec_req.add_user_fea("click", [[0.94433516,0.49145547], [0.94433516, 0.49145597]], "list<list<float>>")
res = client.predict(torchrec_req)
print(res)
The following table describes the key parameters.
-
endpoint: Set this to your service endpoint, for example,
http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/. -
<your_service_name>: Replace this with your service name.
-
<your_service_token>: Replace this with your service token, for example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
Not using FG (fg_mode='bypass')
Using the EAS Java SDK
Before you run the code, configure the Maven environment. For more information, see Use the Java SDK. Check the GitHub page for the latest version number of the SDK. The following code provides an example of how to request the alirec_rank_no_fg service:
package com.aliyun.openservices.eas.predict;
import java.util.List;
import java.util.Arrays;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.request.TorchDataType;
import com.aliyun.openservices.eas.predict.request.TorchRequest;
import com.aliyun.openservices.eas.predict.response.TorchResponse;
public class Test_Torch {
public static PredictClient InitClient() {
return new PredictClient(new HttpConfig());
}
public static TorchRequest buildPredictRequest() {
TorchRequest request = new TorchRequest();
float[] content = new float[2304000];
for (int i = 0; i < content.length; i++) {
content[i] = (float) 0.0;
}
long[] content_i = new long[900];
for (int i = 0; i < content_i.length; i++) {
content_i[i] = 0;
}
long[] a = Arrays.copyOfRange(content_i, 0, 300);
float[] b = Arrays.copyOfRange(content, 0, 230400);
request.addFeed(0, TorchDataType.DT_INT64, new long[]{300,3}, content_i);
request.addFeed(1, TorchDataType.DT_FLOAT, new long[]{300,10,768}, content);
request.addFeed(2, TorchDataType.DT_FLOAT, new long[]{300,768}, b);
request.addFeed(3, TorchDataType.DT_INT64, new long[]{300}, a);
request.addFetch(0);
request.setDebugLevel(903);
return request;
}
public static void main(String[] args) throws Exception {
PredictClient client = InitClient();
client.setToken("tokenGeneratedFromService");
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
client.setModelName("alirec_rank_no_fg");
client.setIsCompressed(false);
long startTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
TorchResponse response = null;
try {
response = client.predict(buildPredictRequest());
List<Float> result = response.getFloatVals(0);
System.out.print("Predict Result: [");
for (int j = 0; j < result.size(); j++) {
System.out.print(result.get(j).floatValue());
if (j != result.size() - 1) {
System.out.print(", ");
}
}
System.out.print("]\n");
} catch (Exception e) {
e.printStackTrace();
}
}
long endTime = System.currentTimeMillis();
System.out.println("Spend Time: " + (endTime - startTime) + "ms");
client.shutdown();
}
}
Where:
-
client.setToken("tokenGeneratedFromService"): Replace the parameter in parentheses with your service token. For example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****. -
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the parameter in parentheses with your service endpoint. For example,
175805416243****.cn-beijing.pai-eas.aliyuncs.com. -
client.setModelName("alirec_rank_no_fg"): Replace the parameter in parentheses with your service name.
Using the EAS Python SDK
Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see Use the Python SDK. The following code provides an example of how to request the alirec_rank_no_fg service:
from eas_prediction import PredictClient
from eas_prediction import TorchRequest
# snappy data
req = TorchRequest(False)
req.add_feed(0, [300, 3], TorchRequest.DT_INT64, [1] * 900)
req.add_feed(1, [300, 10, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 768000)
req.add_feed(2, [300, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 76800)
req.add_feed(3, [300], TorchRequest.DT_INT64, [1] * 300)
client = PredictClient('<your_endpoint>', '<your_service_name>')
client.set_token('<your_service_token>')
client.init()
resp = client.predict(req)
print(resp)
The following table describes the key parameters.
-
<your_endpoint>: Replace this with your service endpoint, for example,
http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/. -
<your_service_name>: Replace this with your service name.
-
<your_service_token>: Replace this with your service token, for example,
MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
For more information about the status codes returned when you access the service, see Service status code description. You can also build a service request manually. For more information, see Request format.
Request format
When a client calls the service, you can manually generate the prediction request code file based on the .proto file. To build the service request manually, refer to the following protobuf definitions to generate the corresponding code:
pytorch_predict.proto: Request definition for a Torch model
syntax = "proto3";
package pytorch.eas;
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchPredictProtos";
enum ArrayDataType {
// Not a legal value for DataType. Used to indicate a DataType field
// has not been set.
DT_INVALID = 0;
// Data types that all computation devices are expected to be
// capable to support.
DT_FLOAT = 1;
DT_DOUBLE = 2;
DT_INT32 = 3;
DT_UINT8 = 4;
DT_INT16 = 5;
DT_INT8 = 6;
DT_STRING = 7;
DT_COMPLEX64 = 8; // Single-precision complex
DT_INT64 = 9;
DT_BOOL = 10;
DT_QINT8 = 11; // Quantized int8
DT_QUINT8 = 12; // Quantized uint8
DT_QINT32 = 13; // Quantized int32
DT_BFLOAT16 = 14; // Float32 truncated to 16 bits. Only for cast ops.
DT_QINT16 = 15; // Quantized int16
DT_QUINT16 = 16; // Quantized uint16
DT_UINT16 = 17;
DT_COMPLEX128 = 18; // Double-precision complex
DT_HALF = 19;
DT_RESOURCE = 20;
DT_VARIANT = 21; // Arbitrary C++ data types
}
// Dimensions of an array
message ArrayShape {
repeated int64 dim = 1 [packed = true];
}
// Protocol buffer representing an array
message ArrayProto {
// Data Type.
ArrayDataType dtype = 1;
// Shape of the array.
ArrayShape array_shape = 2;
// DT_FLOAT.
repeated float float_val = 3 [packed = true];
// DT_DOUBLE.
repeated double double_val = 4 [packed = true];
// DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
repeated int32 int_val = 5 [packed = true];
// DT_STRING.
repeated bytes string_val = 6;
// DT_INT64.
repeated int64 int64_val = 7 [packed = true];
}
message PredictRequest {
// Input tensors.
repeated ArrayProto inputs = 1;
// Output filter.
repeated int32 output_filter = 2;
// Input tensors for rec
map<string, ArrayProto> map_inputs = 3;
// debug_level for rec
int32 debug_level = 100;
}
// Response for PredictRequest on successful run.
message PredictResponse {
// Output tensors.
repeated ArrayProto outputs = 1;
// Output tensors for rec.
map<string, ArrayProto> map_outputs = 2;
}
torchrec_predict.proto: Request definition for a Torch model with FG
syntax = "proto3";
option go_package = ".;torch_predict_protos";
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchRecPredictProtos";
package com.alibaba.pairec.processor;
import "pytorch_predict.proto";
//long->others
message LongStringMap {
map<int64, string> map_field = 1;
}
message LongIntMap {
map<int64, int32> map_field = 1;
}
message LongLongMap {
map<int64, int64> map_field = 1;
}
message LongFloatMap {
map<int64, float> map_field = 1;
}
message LongDoubleMap {
map<int64, double> map_field = 1;
}
//string->others
message StringStringMap {
map<string, string> map_field = 1;
}
message StringIntMap {
map<string, int32> map_field = 1;
}
message StringLongMap {
map<string, int64> map_field = 1;
}
message StringFloatMap {
map<string, float> map_field = 1;
}
message StringDoubleMap {
map<string, double> map_field = 1;
}
//int32->others
message IntStringMap {
map<int32, string> map_field = 1;
}
message IntIntMap {
map<int32, int32> map_field = 1;
}
message IntLongMap {
map<int32, int64> map_field = 1;
}
message IntFloatMap {
map<int32, float> map_field = 1;
}
message IntDoubleMap {
map<int32, double> map_field = 1;
}
// list
message IntList {
repeated int32 features = 1;
}
message LongList {
repeated int64 features = 1;
}
message FloatList {
repeated float features = 1;
}
message DoubleList {
repeated double features = 1;
}
message StringList {
repeated string features = 1;
}
// lists
message IntLists {
repeated IntList lists = 1;
}
message LongLists {
repeated LongList lists = 1;
}
message FloatLists {
repeated FloatList lists = 1;
}
message DoubleLists {
repeated DoubleList lists = 1;
}
message StringLists {
repeated StringList lists = 1;
}
message PBFeature {
oneof value {
int32 int_feature = 1;
int64 long_feature = 2;
string string_feature = 3;
float float_feature = 4;
double double_feature=5;
LongStringMap long_string_map = 6;
LongIntMap long_int_map = 7;
LongLongMap long_long_map = 8;
LongFloatMap long_float_map = 9;
LongDoubleMap long_double_map = 10;
StringStringMap string_string_map = 11;
StringIntMap string_int_map = 12;
StringLongMap string_long_map = 13;
StringFloatMap string_float_map = 14;
StringDoubleMap string_double_map = 15;
IntStringMap int_string_map = 16;
IntIntMap int_int_map = 17;
IntLongMap int_long_map = 18;
IntFloatMap int_float_map = 19;
IntDoubleMap int_double_map = 20;
IntList int_list = 21;
LongList long_list =22;
StringList string_list = 23;
FloatList float_list = 24;
DoubleList double_list = 25;
IntLists int_lists = 26;
LongLists long_lists =27;
StringLists string_lists = 28;
FloatLists float_lists = 29;
DoubleLists double_lists = 30;
}
}
// context features
message ContextFeatures {
repeated PBFeature features = 1;
}
// PBRequest specifies the request for aggregator
message PBRequest {
// debug mode
int32 debug_level = 1;
// user features, key is user input name
map<string, PBFeature> user_features = 2;
// item ids
repeated string item_ids = 3;
// context features for each item, key is context input name
map<string, ContextFeatures> context_features = 4;
// number of nearest neighbors(items) to retrieve
// from faiss
int32 faiss_neigh_num = 5;
// item features for each item, key is item input name
map<string, ContextFeatures> item_features = 6;
// optional meta data
map<string, string> meta_data = 7;
}
// PBResponse specifies the response for aggregator
message PBResponse {
// torch output tensors
map<string, pytorch.eas.ArrayProto> map_outputs = 1;
// fg ouput features
map<string, string> generate_features = 2;
// all fg input features
map<string, string> raw_features = 3;
// item ids
repeated string item_ids = 4;
}
The following table describes the debug_level.
By default, you do not need to configure this parameter. Pass it only when debugging.
|
debug_level |
Description |
|
0 |
The service performs prediction normally. |
|
1 |
In normal mode, this validates the keys of the request and the shapes of the FG input and output. It also returns the input and output features but does not perform prediction. |
|
2 |
In normal mode, this validates the keys of the request and the shapes of the FG input and output. It returns the input and output features, along with the model input tensor, and performs prediction. |
|
3 |
In normal mode, this validates the keys of the request and the shapes of the FG input and output. It returns the output features but does not perform prediction. |
|
100 |
In normal mode, this saves the prediction request. The saved protobuf file contains the original request and the input and output features on the item side. The save path is specified by the request_log_path parameter. |
|
102 |
In normal mode, this performs vector recall, validates the keys of the request, and validates the shapes of the FG input and output. It saves the input and output features, the model input tensor, and the User Embedding result. |
|
903 |
Prints the prediction time for each stage. |
|
904 |
Validates the missing feature fields in the request and records them in the log. |
Service status code description
The following table describes the main status codes that may be returned when you access a TorchEasyRec service. For more information about the status codes returned when you access an EAS service, see Appendix: Service status codes and common errors.
|
Status code |
Description |
|
200 |
The service returns a normal response. |
|
400 |
The request input is invalid. |
|
500 |
The prediction failed. Check the service log for details. |
Save and parse Request pb files
For processor version 1.12 and later, when debug=True is enabled for the request body of the PAI-Rec engine, the processor saves the original request and the input and output features of the item side to a protobuf file on disk. This supports subsequent feature analysis and verification. The protobuf file contains the original request data, the input features on the item side, and the transformed features on the item side. To use this feature, configure the request_log_path parameter to specify the save path and mount an OSS path to it. For example:
"model_config": {
"fg_mode": "normal",
"fg_threads": 8,
"request_log_path": "/request_log",
"background_feature_thread_num": 8
},
"storage": [
{
"mount_path": "/request_log",
"oss": {
"path": "oss://my-bucket/my-model/myrequests/",
"readOnly": false
}
},
{
"mount_path": "/home/admin/docker_ml/workspace/model/",
"oss": {
"path": "oss://my-bucket/my-model/20260316",
"readOnly": false
}
}
]
The processor creates a date_hour subdirectory in the path specified by request_log_path and saves the request data. Disk writes are performed asynchronously by a background thread. The number of background threads is configurable via the model_config.background_feature_thread_num parameter. The default value is 4. You can increase this value to improve write speed. The saved protobuf file name follows the format <request_id>_<random_str>.pb. Because OSS write bandwidth is limited, the traffic of request bodies with debug mode enabled in the PAI-Rec engine should remain moderate. If traffic is too high and writes fall behind, the internal queue of the model service discards new requests without saving them.
To parse the obtained protobuf files, use EAS-Python-SDK version 0.35 or later, or EAS-Java-SDK version 2.0.29 or later. The following code provides a Python example:
from eas_prediction.torchrec_predict_pb2 import PBLogData
with open('xxxx.pb', 'rb') as f:
pb_data = f.read()
pb_log = PBLogData()
pb_log.ParseFromString(pb_data)
print(pb_log) # Print all logs
print(pb_log.request) # Print the request
print(pb_log.raw_features) # Print the raw item-side features
print(pb_log.generate_features) # Print the item-side features after feature generation (fg)