Deploy and call a TorchEasyRec model service - Platform For AI

Elastic Algorithm Service (EAS) of Platform for AI (PAI) provides a built-in TorchEasyRec processor. This processor facilitates the deployment of TorchEasyRec or Torch recommendation models as scoring services, and integrates feature engineering capabilities. You can use the TorchEasyRec processor to deploy high-performance scoring services that are optimized for feature engineering and Torch models. This topic describes how to deploy and call a TorchEasyRec model service.

Background information

The following figure shows the architecture of a recommendation engine based on the TorchEasyRec processor.

The TorchEasyRec processor includes the following modules:

Item Feature Cache: This module caches item features from FeatureStore to memory. This reduces the load of FeatureStore resulting from frequent request operations and improves the performance of an inference service. If item features contain real-time features, FeatureStore synchronizes the real-time features.

Feature Generator (FG): This module uses a configuration file to define the feature engineering process and uses a collection of C++ code for real-time and offline feature engineering to ensure consistency.
TorchModel: Indicates a scripted model file that is exported from TorchEasyRec or Torch model training.

Limits

The TorchEasyRec processor can be used on GPU devices of the T4 and A10 types and general-purpose instance families, including g6, g7, and g8. If you use GPU devices, make sure that the version of the Compute Unified Device Architecture (CUDA) driver is 535 or later.

Processor versions

The TorchEasyRec processor is continuously improved. Later versions provide enhanced features and inference performance. For optimal results, we recommend that you use the latest version to deploy your inference service. The following table describes the released versions.

Processor name	Release date	Torch version	FG version	New features
easyrec-torch-0.1	20240910	2.4	0.2.9	Adds the FG and FeatureStore Item Feature Cache modules. Supports CPU or GPU inference for Torch models. Supports automatic broadcasting specified by the INPUT_TILE parameter. Supports Faiss vector recall. Supports warm-up in normal mode.
easyrec-torch-0.2	20240930	2.4	0.2.9	Supports complex types for FeatureDB. Accelerates data initialization in FeatureStore. Optimizes the debug level in bypass mode. Optimizes H2D.
easyrec-torch-0.3	20241014	2.4	0.2.9	Supports JSON initialization for FeatureStore Redefines the .proto file.
easyrec-torch-0.4	20241028	2.4	0.3.1	Fixes complex issues of FG.
easyrec-torch-0.5	20241114	2.4	0.3.1	Optimizes online and offline consistency logic that features are generated by using FG regardless of whether items exist when you configure the debug mode.
easyrec-torch-0.6	20241118	2.4	0.3.6	Optimizes the packaging process by removing redundant header files.
easyrec-torch-0.7	20241206	2.5	0.3.9	Supports the data input in the array format for multiple primary keys. Upgrades the Torch version to 2.5. Upgrades the FG version to 0.3.9.
easyrec-torch-0.8	20241225	2.5	0.3.9	Upgrades the TensorRT SDK version to 2.5. Supports input data of the INT64 type for the TorchEasyRec model. Upgrades the FeatureStore version to resolve issues related to feature query in Hologres. Optimizes the operation efficiency and logic of debugging. Adds item_features to Protobuf and supports including item features in the request message.
easyrec-torch-0.9	20250115	2.5	0.4.1	Upgrades the FG version to 0.4.1 and optimizes the initialization time for multithreaded FG handlers.
easyrec-torch-1.0	20250206	2.5	0.4.2	Supports Weighted Feature. Upgrades the FG version to 0.4.2. Supports AMD CPUs.
easyrec-torch-1.1	20250423	2.5	0.5.9	Upgrades the FeatureStore SDK to add support for VPC direct connection. The upgrade also includes filtering of real-time feature expiration data in memory based on event_time and ttl. Upgrades the Feature Generator (FG) version to include support for custom sequence features and resolve issues related to combo features.
easyrec-torch-1.5	20250918	2.5	0.7.3	Upgrades FG to version 0.7.3, enabling model warm-up using online requests Upgrades the FeatureStore SDK to support MaxCompute schema with three-level tables, zero-trust AK-less calls, and compatibility for adding features to feature views.
easyrec-torch-1.6	20251021	2.5	0.7.4	Optimized log control to prevent excessive log output—caused by a high volume of callback requests—from impacting performance. Improved context feature Sharing the same thread pool between feature preprocessing and FG, thereby conserving thread resources. Upgraded FG to version 0.7.4.
easyrec-torch-1.7	20251104	2.5	0.7.4	Optimized the logic for saving debug tensors to avoid excessive file writes triggered by callbacks.

Step 1: Deploy a model service

Prepare the torcheasyrec.json service configuration file.

You must set the processor parameter to easyrec-torch-{version} and configure the value of {version} based on the Processor versions. The following sample code provides an example of a JSON configuration file:

Sample code when the fg_mode parameter is set to normal

{
  "metadata": {
    "instance": 1,
    "name": "alirec_rank_with_fg",
    "rpc": {
      "enable_jemalloc": 1,
      "max_queue_size": 256,
      "worker_threads": 16
    }
  },
  "cloud": {
        "computing": {
            "instance_type": "ecs.gn6i-c16g1.4xlarge"
        }
  },
  "model_config": {
    "fg_mode": "normal",
    "fg_threads": 8,
    "region": "YOUR_REGION",
    "fs_project": "YOUR_FS_PROJECT",
    "fs_model": "YOUR_FS_MODEL",
    "fs_entity": "item",
    "load_feature_from_offlinestore": true,
    "access_key_id":"YOUR_ACCESS_KEY_ID",
    "access_key_secret":"YOUR_ACCESS_KEY_SECRET"
  },
  "storage": [
    {
      "mount_path": "/home/admin/docker_ml/workspace/model/",
      "oss": {
        "path": "oss://xxx/xxx/export",
        "readOnly": false
      },
      "properties": {
        "resource_type": "code"
      }
    }
  ],
  "processor":"easyrec-torch-0.3"
}

Sample code when the fg_mode parameter is set to bypass

{
  "metadata": {
    "instance": 1,
    "name": "alirec_rank_no_fg",
    "rpc": {
      "enable_jemalloc": 1,
      "max_queue_size": 256,
      "worker_threads": 16
    }
  },
  "cloud": {
        "computing": {
            "instance_type": "ecs.gn6i-c16g1.4xlarge"
        }
  },
  "model_config": {
    "fg_mode": "bypass"
  },
  "storage": [
    {
      "mount_path": "/home/admin/docker_ml/workspace/model/",
      "oss": {
        "path": "oss://xxx/xxx/export",
        "readOnly": false
      },
      "properties": {
        "resource_type": "code"
      }
    }
  ],
  "processor":"easyrec-torch-0.3"
}

The following table describes the key parameters. For information about other parameters, see JSON deployment.

Parameter	Required	Description	Example
processor	Yes	The TorchEasyRec processor.	"processor":"easyrec-torch-0.3"
path	Yes	The Object Storage Service (OSS) path to which the model file is mounted.	"path": "oss://examplebucket/xxx/export"
fg_mode	No	The feature engineering mode. Valid values: bypass (default): The FG module is not used, and only a Torch model is deployed. This mode is suitable for custom feature engineering scenarios. In bypass mode, you do not need to configure the parameters related to FeatureStore. normal: The FG module is used. In most cases, the FG module is used together with TorchEasyRec for model training.	"fg_mode": "normal"
fg_threads	No	The number of concurrent threads that are used to run the FG module for a single request.	"fg_threads": 15
outputs	No	The names of the output variables for the Torch model. Example: probs_ctr. If multiple variables are to be returned, separate them with commas (,). By default, all variables are output.	"outputs":"probs_ctr,probs_cvr"
item_empty_score	No	The default score when the item ID does not exist. Default value: 0.	"item_empty_score": -1
Parameters related to retrieval
faiss_neigh_num	No	The number of retrieved vectors. By default, the value of the `faiss_neigh_num` field in a request is used. If the `faiss_neigh_num` field does not exist in a request, the value of the `faiss_neigh_num` field in `model_config` is used. Default value: 1.	"faiss_neigh_num": 200
faiss_nprobe	No	The number of clusters obtained in the retrieval process. Default value: 800. In FAISS, an inverted file index divides the data into multiple small clusters (groups) and maintains an inverted list for each cluster. A large value indicates high retrieval accuracy but increases computing costs and retrieval time. Conversely, a small value indicates low retrieval accuracy and accelerates the retrieval speed.	"faiss_nprobe": 700
Parameters related to FeatureStore
fs_project	No	The name of the FeatureStore project. This parameter is required if you use FeatureStore. For more information about FeatureStore, see Configure a FeatureStore project.	"fs_project": "fs_demo"
fs_model	No	The name of the model feature in FeatureStore.	"fs_model": "fs_rank_v1"
fs_entity	No	The name of the feature entity in FeatureStore.	"fs_entity": "item"
region	No	The region in which FeatureStore resides. For example, if FeatureStore resides in China (Beijing), set this parameter to cn-beijing. For more information about regions, see Endpoints.	"region": "cn-beijing"
access_key_id	No	The AccessKey ID of FeatureStore.	"access_key_id": "xxxxx"
access_key_secret	No	The AccessKey secret of FeatureStore.	"access_key_secret": "xxxxx"
load_feature_from_offlinestore	No	Specifies whether to obtain offline feature data from an offline data store in FeatureStore. Valid values: True: Offline feature data is obtained from an offline data store in FeatureStore. False (default): Offline feature data is obtained from an online data store in FeatureStore.	"load_feature_from_offlinestore": True
featuredb_username	No	The username of FeatureDB.	"featuredb_username":"xxx"
featuredb_password	No	The password of the username for FeatureDB.	"featuredb_passwd":"xxx"
Parameters related to automatic broadcasting
INPUT_TILE	No	Enables automatic broadcasting for features. If the values of a feature, such as user_id, are the same in a request, specify the value only once to reduce the request size, network transfer time, and calculation time. You must use this feature together with TorchEasyRec in normal mode. You must also configure related environment variables when the related file is exported. By default, the system reads the INPUT_TILE value from the model_acc.json file, in the model directory, which is exported from TorchEasyRec. If the file does not exist, the system reads the value from the environment variable. After the feature is enabled: If you set this parameter to 2, FG calculates input data only once for user features. If you set this parameter to 3, FG calculates the embedding information only once for a user feature. The system separately calculates user and item embedding information. This is suitable for scenarios that involve a large number of user features.	"processor_envs": [ { "name": "INPUT_TILE", "value": "2" } ]
NO_GRAD_GUARD	No	Disables gradient calculation during inference. If you configure this parameter, the tracking operation stops and the computation graph is not built as expected. Note If you set this parameter to 1, specific models may be incompatible. If a stuck issue occurs during subsequent inferences, you can resolve the issue by adding the configuration `PYTORCH_TENSOREXPR_FALLBACK=2`. This allows you to skip the compilation step and retain specific graph optimization capabilities.	"processor_envs": [ { "name": "NO_GRAD_GUARD", "value": "1" } ]
Parameters Related to Model Warm-up
`warmup_data_path`	No	Enables the warm-up feature and specifies the path to save warm-up files. To prevent data loss, an OSS mount must be added in the storage configuration, mounted to this path.	"warmup_data_path": "/warmup"
`warmup_cnt_per_file`	No	Number of warm-up iterations per warm-up PB file. Increasing this value appropriately ensures more thorough warm-up, but will extend the warm-up time. Default value: 20.	"warmup_cnt_per_file": 20,
`warmup_pb_files_count`	No	Number of online requests to save as PB files for warm-up during the next startup. The files are saved to the path specified by `warmup_data_path`. Default value: 64.	"warmup_pb_files_count": 64

Deploy a TorchEasyRec model service. You can use one of the following deployment methods:
(Recommended) Deploy a model service by using JSON
Perform the following steps:
1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. On the Deploy Service page, click JSON Deployment in the Custom Model Deployment section.
3. On the JSON Deployment page, enter the content of the JSON configuration file that you prepared in the JSON text editor and click Deploy.
Deploy a model service by using the EASCMD client
1. Download the EASCMD client and complete identity authentication. In this example, Windows 64 is used.
2. Launch the client and run the following command in the directory in which the JSON configuration file is located to create a model service. For more information, see Run commands to use the EASCMD client.
```
eascmdwin64.exe create <service.json>
```
  Replace <service.json> with the name of the JSON configuration file that you created, such as torcheasyrec.json.

Step 2: Call the model service

After you deploy a TorchEasyRec model service, you can perform the following steps to view and call the model service:

Log on to the PAI console. In the top navigation bar, select the desired region. Choose Model Deployment > Elastic Algorithm Service (EAS). On the page that appears, select the desired workspace and click Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, find the desired model service and click Invocation Method in the Service Type column. In the Invocation Method dialog box, view the endpoint and token of the model service.

The input and output of the TorchEasyRec model service are in the Protocol Buffers (protobuf) format. You can call a model service based on whether FG is used.

Call a model service when FG is used

You can call a model service by using one of the following methods:

EAS SDK for Java

Before you run the code, you must configure the Maven environment. For more information, see SDK for Java. Sample code for calling the alirec_rank_with_fg model service:

package com.aliyun.openservices.eas.predict;

import com.aliyun.openservices.eas.predict.http.Compressor;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.proto.TorchRecPredictProtos;
import com.aliyun.openservices.eas.predict.request.TorchRecRequest;
import com.aliyun.openservices.eas.predict.proto.TorchPredictProtos.ArrayProto;

import java.util.*;


public class TorchRecPredictTest {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRecRequest buildPredictRequest() {
        TorchRecRequest TorchRecRequest = new TorchRecRequest();
        TorchRecRequest.appendItemId("7033");

        TorchRecRequest.addUserFeature("user_id", 33981,"int");

        ArrayList<Double> list = new ArrayList<>();
        list.add(0.24689289764507472);
        list.add(0.005758482924454689);
        list.add(0.6765301324940026);
        list.add(0.18137273055602343);
        TorchRecRequest.addUserFeature("raw_3", list,"List<double>");

        Map<String,Integer> myMap =new LinkedHashMap<>();
        myMap.put("866", 4143);
        myMap.put("1627", 2451);
        TorchRecRequest.addUserFeature("map_1", myMap,"map<string,int>");

        ArrayList<ArrayList<Float>> list2 = new ArrayList<>();
        ArrayList<Float> innerList1 = new ArrayList<>();
        innerList1.add(1.1f);
        innerList1.add(2.2f);
        innerList1.add(3.3f);
        list2.add(innerList1);
        ArrayList<Float> innerList2 = new ArrayList<>();
        innerList2.add(4.4f);
        innerList2.add(5.5f);
        list2.add(innerList2);
        TorchRecRequest.addUserFeature("click", list2,"list<list<float>>");

        TorchRecRequest.addContextFeature("id_2", list,"List<double>");
        TorchRecRequest.addContextFeature("id_2", list,"List<double>");

        System.out.println(TorchRecRequest.request);
        return TorchRecRequest;
    }

    public static void main(String[] args) throws Exception{
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_with_fg");
        client.setRequestTimeout(100000);


        testInvoke(client);
        testDebugLevel(client);
        client.shutdown();
    }

    public static void testInvoke(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecPredictProtos.PBResponse response = client.predict(buildPredictRequest());
        for (Map.Entry<String, ArrayProto> entry : response.getMapOutputsMap().entrySet()) {

            System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }

    public static void testDebugLevel(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecRequest request = buildPredictRequest();
        request.setDebugLevel(1);
        TorchRecPredictProtos.PBResponse response = client.predict(request);
        Map<String, String> genFeas = response.getGenerateFeaturesMap();
        for(String itemId: genFeas.keySet()) {
            System.out.println(itemId);
            System.out.println(genFeas.get(itemId));
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }
}

Take note of the following parameters:

client.setToken("tokenGeneratedFromService"): Replace tokenGeneratedFromService with the token of your model service, such as MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the content enclosed in double quotation marks with the endpoint of your model service, such as 175805416243****.cn-beijing.pai-eas.aliyuncs.com.
client.setModelName("alirec_rank_with_fg"): Replace the content enclosed in double quotation marks with the name of your model service.

EAS SDK for Python

Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see SDK for Python. Sample code:

from eas_prediction import PredictClient
from eas_prediction.torchrec_request import TorchRecRequest


if __name__ == '__main__':
    endpoint = 'http://localhost:6016'

    client = PredictClient(endpoint, '<YOUR_SERVICE_NAME>')
    client.set_token('<your_service_token>')
    client.init()
    torchrec_req = TorchRecRequest()

    torchrec_req.add_user_fea('user_id', 'u001d', "STRING")
    torchrec_req.add_user_fea('age', 12, "INT")
    torchrec_req.add_user_fea('weight', 129.8, "FLOAT")
    torchrec_req.add_item_id('item_0001')
    torchrec_req.add_item_id('item_0002')
    torchrec_req.add_item_id('item_0003')
    torchrec_req.add_user_fea("raw_3", [0.24689289764507472, 0.005758482924454689, 0.6765301324940026, 0.18137273055602343], "list<double>")
    torchrec_req.add_user_fea("raw_4", [0.9965264740966043, 0.659596586238391, 0.16396649403055896, 0.08364986620265635], "list<double>")
    torchrec_req.add_user_fea("map_1", {"0":0.37845234405201145}, "map<int,float>")
    torchrec_req.add_user_fea("map_2", {"866":4143,"1627":2451}, "map<int,int>")
    torchrec_req.add_context_fea("id_2", [866], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_user_fea("click", [[0.94433516,0.49145547], [0.94433516, 0.49145597]], "list<list<float>>")

    res = client.predict(torchrec_req)
    print(res)

Take note of the following parameters:

endpoint: Set the value to the endpoint of your model service, such as http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.
Replace <your_service_name> with the name of your model service.
Replace <your_service_token> with the token of your model service, such as MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

Call a model service when FG is not used

EAS SDK for Java

Before you run the code, you must configure the Maven environment. For more information, see SDK for Java. Sample code for calling the alirec_rank_no_fg model service:

package com.aliyun.openservices.eas.predict;

import java.util.List;
import java.util.Arrays;


import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.request.TorchDataType;
import com.aliyun.openservices.eas.predict.request.TorchRequest;
import com.aliyun.openservices.eas.predict.response.TorchResponse;

public class Test_Torch {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRequest buildPredictRequest() {
        TorchRequest request = new TorchRequest();
        float[] content = new float[2304000];
        for (int i = 0; i < content.length; i++) {
            content[i] = (float) 0.0;
        }
        long[] content_i = new long[900];
        for (int i = 0; i < content_i.length; i++) {
            content_i[i] = 0;
        }

        long[] a = Arrays.copyOfRange(content_i, 0, 300);
        float[] b = Arrays.copyOfRange(content, 0, 230400);
        request.addFeed(0, TorchDataType.DT_INT64, new long[]{300,3}, content_i);
        request.addFeed(1, TorchDataType.DT_FLOAT, new long[]{300,10,768}, content);
        request.addFeed(2, TorchDataType.DT_FLOAT, new long[]{300,768}, b);
        request.addFeed(3, TorchDataType.DT_INT64, new long[]{300}, a);
        request.addFetch(0);
        request.setDebugLevel(903);
        return request;
    }

    public static void main(String[] args) throws Exception {
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_no_fg");
        client.setIsCompressed(false);
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < 10; i++) {
            TorchResponse response = null;
            try {
                response = client.predict(buildPredictRequest());
                List<Float> result = response.getFloatVals(0);
                System.out.print("Predict Result: [");
                for (int j = 0; j < result.size(); j++) {
                    System.out.print(result.get(j).floatValue());
                    if (j != result.size() - 1) {
                        System.out.print(", ");
                    }
                }
                System.out.print("]\n");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");
        client.shutdown();
    }
}

Take note of the following parameters:

client.setToken("tokenGeneratedFromService"): Replace tokenGeneratedFromService with the token of your model service, such as MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the content enclosed in double quotation marks with the endpoint of your model service, such as 175805416243****.cn-beijing.pai-eas.aliyuncs.com.
client.setModelName("alirec_rank_no_fg"): Replace the content enclosed in double quotation marks with the name of your model service.

EAS SDK for Python

Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see SDK for Python. Sample code for calling the alirec_rank_no_fg model service:

from eas_prediction import PredictClient
from eas_prediction import TorchRequest

# snappy data
req = TorchRequest(False)

req.add_feed(0, [300, 3], TorchRequest.DT_INT64, [1] * 900)
req.add_feed(1, [300, 10, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 768000)
req.add_feed(2, [300, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 76800)
req.add_feed(3, [300], TorchRequest.DT_INT64, [1] * 300)


client = PredictClient('<your_endpoint>', '<your_service_name>')
client.set_token('<your_service_token>')

client.init()

resp = client.predict(req)
print(resp)

Take note of the following parameters:

Replace <your_endpoint> with the endpoint of your model service, such as http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.
Replace <your_service_name> with the name of your model service.
Replace <your_service_token> with the token of your model service, such as MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

For information about the status codes returned when you call a model service, see Status codes. You can also create custom service requests. For more information, see Request syntax.

Request syntax

If you call a model service on a client, you must manually generate prediction code from a .proto file. To generate code for custom service requests, use the following protobuf definitions:

pytorch_predict.proto: protobuf definition for a Torch model

syntax = "proto3";

package pytorch.eas;
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchPredictProtos";

enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;
  
  // Data types that all computation devices are expected to be
  // capable to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
}

// Dimensions of an array
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}

// Protocol buffer representing an array
message ArrayProto {
  // Data Type.
  ArrayDataType dtype = 1;

  // Shape of the array.
  ArrayShape array_shape = 2;

  // DT_FLOAT.
  repeated float float_val = 3 [packed = true];

  // DT_DOUBLE.
  repeated double double_val = 4 [packed = true];

  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];

  // DT_STRING.
  repeated bytes string_val = 6;

  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];

}


message PredictRequest {

  // Input tensors.
  repeated ArrayProto inputs = 1;

  // Output filter.
  repeated int32 output_filter = 2;

  // Input tensors for rec
  map<string, ArrayProto> map_inputs = 3;

  // debug_level for rec
  int32 debug_level = 100;
}

// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  repeated ArrayProto outputs = 1;
  // Output tensors for rec.
  map<string, ArrayProto> map_outputs = 2;
}

torchrec_predict.proto: protobuf definition for a Torch model and FG

syntax = "proto3";

option go_package = ".;torch_predict_protos";
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchRecPredictProtos";
package com.alibaba.pairec.processor;
import "pytorch_predict.proto";

//long->others
message LongStringMap {
  map<int64, string> map_field = 1;
}
message LongIntMap {
  map<int64, int32> map_field = 1;
}
message LongLongMap {
  map<int64, int64> map_field = 1;
}
message LongFloatMap {
  map<int64, float> map_field = 1;
}
message LongDoubleMap {
  map<int64, double> map_field = 1;
}

//string->others
message StringStringMap {
  map<string, string> map_field = 1;
}
message StringIntMap {
  map<string, int32> map_field = 1;
}
message StringLongMap {
  map<string, int64> map_field = 1;
}
message StringFloatMap {
  map<string, float> map_field = 1;
}
message StringDoubleMap {
  map<string, double> map_field = 1;
}

//int32->others
message IntStringMap {
  map<int32, string> map_field = 1;
}
message IntIntMap {
  map<int32, int32> map_field = 1;
}
message IntLongMap {
  map<int32, int64> map_field = 1;
}
message IntFloatMap {
  map<int32, float> map_field = 1;
}
message IntDoubleMap {
  map<int32, double> map_field = 1;
}

// list
message IntList {
  repeated int32 features = 1;
}
message LongList {
  repeated int64 features  = 1;
}

message FloatList {
  repeated float features = 1;
}
message DoubleList {
  repeated double features = 1;
}
message StringList {
  repeated string features = 1;
}

// lists
message IntLists {
  repeated IntList lists = 1;
}
message LongLists {
  repeated LongList lists = 1;
}

message FloatLists {
  repeated FloatList lists = 1;
}
message DoubleLists {
  repeated DoubleList lists = 1;
}
message StringLists {
  repeated StringList lists = 1;
}

message PBFeature {
  oneof value {
    int32 int_feature = 1;
    int64 long_feature = 2;
    string string_feature = 3;
    float float_feature = 4;
    double double_feature=5;

    LongStringMap long_string_map = 6; 
    LongIntMap long_int_map = 7; 
    LongLongMap long_long_map = 8; 
    LongFloatMap long_float_map = 9; 
    LongDoubleMap long_double_map = 10; 
    
    StringStringMap string_string_map = 11; 
    StringIntMap string_int_map = 12; 
    StringLongMap string_long_map = 13; 
    StringFloatMap string_float_map = 14; 
    StringDoubleMap string_double_map = 15; 

    IntStringMap int_string_map = 16; 
    IntIntMap int_int_map = 17; 
    IntLongMap int_long_map = 18; 
    IntFloatMap int_float_map = 19; 
    IntDoubleMap int_double_map = 20; 

    IntList int_list = 21; 
    LongList long_list =22;
    StringList string_list = 23;
    FloatList float_list = 24;
    DoubleList double_list = 25;

    IntLists int_lists = 26;
    LongLists long_lists =27;
    StringLists string_lists = 28;
    FloatLists float_lists = 29;
    DoubleLists double_lists = 30;
    
  }
}

// context features
message ContextFeatures {
  repeated PBFeature features = 1;
}

// PBRequest specifies the request for aggregator
message PBRequest {
  // debug mode
  int32 debug_level = 1;

  // user features, key is user input name
  map<string, PBFeature> user_features = 2;

  // item ids
  repeated string item_ids = 3;

  // context features for each item, key is context input name 
  map<string, ContextFeatures> context_features = 4;

  // number of nearest neighbors(items) to retrieve
  // from faiss
  int32 faiss_neigh_num = 5;

  // item features for each item, key is item input name 
  map<string, ContextFeatures> item_features = 6;
}

// PBResponse specifies the response for aggregator
message PBResponse {
  // torch output tensors
  map<string, pytorch.eas.ArrayProto> map_outputs = 1;

  // fg ouput features
  map<string, string> generate_features = 2;

  // all fg input features
  map<string, string> raw_features = 3;

  // item ids
  repeated string item_ids = 4;

}

The following table describes the debug_level parameter.

Note

By default, you do not need to configure the debug_level parameter. Configure this parameter only if you need to perform debugging.

debug_level	Description
0	A model service is successfully called.
1	In normal mode, shape verification is performed on the input and output of FG, and the input and output features are saved.
2	In normal mode, shape verification is performed on the input and output of FG, the input and output features are saved, and the input tensor of a model service is saved.
100	In normal mode, the request to call a model service is saved.
102	In normal mode, shape verification is performed on the input and output of FG, the input and output features are saved, and the input tensor and user embedding information of a model service are saved.
903	The time required to call a model service in each phase is printed.

Status codes

The following table describes the status codes that are returned when you call a TorchEasyRec model service. For more information about the status codes, see Appendix: Service status codes and common errors.

Status code	Description
200	A model service is successfully called.
400	The request information is incorrect.
500	A model service failed to be called. For more information, see the service log.

Background information

Limits

Processor versions

Step 1: Deploy a model service

Sample code when the fg_mode parameter is set to normal

Sample code when the fg_mode parameter is set to bypass

(Recommended) Deploy a model service by using JSON

Deploy a model service by using the EASCMD client

Step 2: Call the model service

Call a model service when FG is used

EAS SDK for Java

EAS SDK for Python

Call a model service when FG is not used

EAS SDK for Java

EAS SDK for Python

Request syntax

pytorch_predict.proto: protobuf definition for a Torch model

torchrec_predict.proto: protobuf definition for a Torch model and FG

Status codes