Deploying and calling TorchEasyRec model service - Platform For AI

The built-in TorchEasyRec Processor in EAS deploys recommendation models trained with TorchEasyRec or PyTorch as scoring services with integrated feature engineering. The processor jointly optimizes feature engineering and the PyTorch model to deliver high-performance scoring services. This topic explains how to deploy and call a TorchEasyRec model service.

Background

The following diagram shows the architecture of a recommendation engine based on the TorchEasyRec Processor.

The TorchEasyRec Processor consists of the following modules:

Item Feature Cache: Caches item-side features from FeatureStore in memory to reduce network overhead, ease the load on FeatureStore, and boost inference service performance. When item-side features include real-time features, FeatureStore handles their synchronization.

feature generator (FG): Defines feature transformations in a configuration file, using a unified C++ codebase to ensure consistent logic between offline and online feature processing.
TorchModel: A PyTorch model trained with TorchEasyRec or PyTorch and exported as a ScriptedModel.

Limitations

Support is limited to the g6, g7, and g8 models of the general-purpose instance family, and GPU models such as T4 and A10. For more information, see general-purpose instance family (g series). If you deploy a GPU service, ensure that the CUDA Driver version is 535 or later.

Version history

The TorchEasyRec Processor is under active development. We recommend using the latest version to deploy your inference service, which provides more features and higher inference performance. The released versions are as follows:

Processor	Date	Torch version	FG version	Updates
easyrec-torch-0.1	2024-09-10	2.4	0.2.9	Added support for Feature Generator (FG) and FeatureStore item feature cache. Added support for CPU and GPU inference for PyTorch models. Added support for automatic expansion of `Input_Tile` user features. Added support for Faiss vector recall. Added support for model warm-up in `normal` mode.
easyrec-torch-0.2	2024-09-30	2.4	0.2.9	FeatureDB now supports complex data types. Accelerated data loading for FeatureStore initialization. Optimized the `debug_level` in `bypass` mode. Optimized host-to-device (H2D) data transfer.
easyrec-torch-0.3	2024-10-14	2.4	0.2.9	Added support for JSON-based initialization in FeatureStore. Redefined the Protobuf definitions.
easyrec-torch-0.4	2024-10-28	2.4	0.3.1	Fixed an issue with complex data types in Feature Generator (FG).
easyrec-torch-0.5	2024-11-14	2.4	0.3.1	Optimized online and offline consistency. When debug mode is enabled, feature information is generated after FG processing, even if the item does not exist.
easyrec-torch-0.6	2024-11-18	2.4	0.3.6	Optimized the packaging process by removing redundant header files.
easyrec-torch-0.7	2024-12-06	2.5	0.3.9	Added support for array-type sequence primary keys. Upgraded PyTorch to version 2.5. Upgraded Feature Generator (FG) to version 0.3.9.
easyrec-torch-0.8	2024-12-25	2.5	0.3.9	Upgraded the TensorRT SDK to version 2.5. Added support for the `int64` data type for model inputs. Upgraded the FeatureStore version to resolve an issue with feature queries in Hologres. Optimized runtime efficiency and logic in debug mode. Added `item_features` to the Protobuf definition to pass item features from the request.
easyrec-torch-0.9	2025-01-15	2.5	0.4.1	Upgraded Feature Generator (FG) to version 0.4.1 to optimize initialization time in multi-threaded environments.
easyrec-torch-1.0	2025-02-06	2.5	0.4.2	Added support for weighted features. Upgraded Feature Generator (FG) to version 0.4.2. Added support for AMD CPUs.
easyrec-torch-1.1	2025-04-23	2.5	0.5.9	Upgraded the FeatureStore SDK. The new version provides high-speed connectivity to FeatureDB over a VPC network and filters expired real-time features in memory based on `event_time` and `ttl`. Upgraded the Feature Generator (FG). This version adds support for custom sequence features and fixes issues related to combo features.
easyrec-torch-1.2	2025-05-12	2.5	0.6.0	Upgraded Feature Generator (FG) to version 0.6.0. Added support for reading features from multiple FeatureStore entities, for example, `config["fs_entity"] = "item,raw";`. In debug mode, the processor now outputs the IDs of items in the request that are not found in FeatureStore.
easyrec-torch-1.3	2025-05-29	2.5	0.6.5	Upgraded Feature Generator (FG) to version 0.6.5. Added FSMAP support for weighted ID features. Added support for WordPiece tokenization. Added a `boolean_mask` filter operator. Enhanced the expression feature operator.
easyrec-torch-1.4	2025-07-15	2.5	0.6.9	Upgraded Feature Generator (FG) to version 0.6.9. Added new functions to the expression feature operator. Moved the debug string generation logic from the processor to the FG library.
easyrec-torch-1.5	2025-09-18	2.5	0.7.3	Upgraded Feature Generator (FG) to version 0.7.3. Added support for capturing online requests for model warm-up. Upgraded the FeatureStore SDK to version 20250826. This version supports three-level table schemas in MaxCompute, zero-trust calls without an AccessKey, and is compatible with adding features to a feature view.
easyrec-torch-1.6	2025-10-21	2.5	0.7.4	Optimized log control to prevent performance degradation from excessive logging during high callback volume. Optimized context feature processing. Feature preprocessing and FG now share a thread pool to conserve thread resources. Upgraded Feature Generator (FG) to version 0.7.4.
easyrec-torch-1.7	2025-11-04	2.5	0.7.4	Optimized the logic for saving debug tensors to prevent excessive file creation triggered by callbacks.
easyrec-torch-1.8	2025-12-01	2.5	0.7.4	Optimized the FeatureStore SDK thread pool to prevent thread creation failures under high resource pressure. Upgraded the FeatureStore SDK to version 20251117.
easyrec-torch-1.9	2026-01-09	2.5	1.0.0	Enabled CUDA multi-stream for GPU inference to improve system throughput and performance. Upgraded Feature Generator (FG) to version 1.0.0.
easyrec-torch-1.10	2026-01-23	2.5	1.0.1	Enabled automatic logging of the execution time for slow requests. Added a configuration parameter to save request data when a slow request is detected.
easyrec-torch-1.11	2026-02-10	2.5	1.0.1	Fixed a memory contiguity issue with output tensors in specific scenarios. Upgraded the FeatureStore SDK to version 20260202.
easyrec-torch-1.12	2026-03-13	2.5	1.0.1	In debug mode for PAI-Rec engine requests, the model service now asynchronously saves the original request and item-side features (pre- and post-FG processing) to a disk in Protobuf format. You can specify the save path by using the `request_log_path` parameter and mounting an OSS bucket to this path at startup. Upgraded the FeatureStore SDK to version 20260305.
Notes on version 2.0 and later `easyrec-torch-2.0` includes an upgraded GLIBC version in the EAS backend base image. Therefore, when you deploy version 2.0 or later of the processor: If you are creating a new EAS service, follow the standard deployment procedure. The deployment process is the same as for versions 0.x and 1.x. If you are upgrading an existing EAS service that was created before March 15, 2026, you must contact an Alibaba Cloud technical expert to upgrade the backend base image of your service before you upgrade the processor. Otherwise, the deployment may fail due to an incompatible runtime environment.
easyrec-torch-2.0	2026-03-17	2.8	1.0.1	Upgraded the PyTorch runtime to 2.8. Upgraded the CUDA runtime to 12.6. Upgraded the fbgemm_gpu runtime to 1.3. Upgraded the GLIBC version in the base image to 2.38.
easyrec-torch-2.1	2026-04-09	2.8	1.0.2	Fixed online and offline consistency issues caused by missing feature values. Set the default value of the `fg_threads` parameter to the number of logical CPU cores. Added support for capturing performance logs by using the Kineto profiler. Upgraded the FeatureStore SDK to version 20260402.
easyrec-torch-2.2	2026-04-29	2.8	1.0.5	Added support for DLRM-HSTU inference. Fixed a deployment error that occurred in CPU-only environments for versions 2.0 and 2.1. Upgraded the FeatureStore SDK to version 20260416.
easyrec-torch-2.3	2026-06-08	2.11	1.0.5	Upgraded the PyTorch runtime to 2.11. Upgraded the CUDA runtime to 12.9. Upgraded the FeatureStore SDK to version 20260518. This version resolves an issue where the service would occasionally hang when loading features in a multi-threaded environment. Changed the release package format to `tar.zst` for faster decompression during online startup. Fixed an issue with the `cand_seq` sub-feature in HSTU. Added support for data passthrough.

Step 1: Deploy a service

Prepare the torcheasyrec.json service configuration file.

You need to specify the Processor type as easyrec-torch-{version}, and select a value for {version} from the version history. An example of the JSON configuration file is as follows:

Example with FG (fg_mode='normal')

{
  "metadata": {
    "instance": 1,
    "name": "alirec_rank_with_fg",
    "rpc": {
      "enable_jemalloc": 1,
      "max_queue_size": 256,
      "worker_threads": 16
    }
  },
  "cloud": {
        "computing": {
            "instance_type": "ecs.gn6i-c16g1.4xlarge"
        }
  },
  "model_config": {
    "fg_mode": "normal",
    "fg_threads": 8,
    "region": "YOUR_REGION",
    "fs_project": "YOUR_FS_PROJECT",
    "fs_model": "YOUR_FS_MODEL",
    "fs_entity": "item",
    "load_feature_from_offlinestore": true,
    "access_key_id":"YOUR_ACCESS_KEY_ID",
    "access_key_secret":"YOUR_ACCESS_KEY_SECRET"
  },
  "storage": [
    {
      "mount_path": "/home/admin/docker_ml/workspace/model/",
      "oss": {
        "path": "oss://xxx/xxx/export",
        "readOnly": false
      },
      "properties": {
        "resource_type": "code"
      }
    }
  ],
  "processor":"easyrec-torch-1.12"
}

Example without FG (fg_mode='bypass')

{
  "metadata": {
    "instance": 1,
    "name": "alirec_rank_no_fg",
    "rpc": {
      "enable_jemalloc": 1,
      "max_queue_size": 256,
      "worker_threads": 16
    }
  },
  "cloud": {
        "computing": {
            "instance_type": "ecs.gn6i-c16g1.4xlarge"
        }
  },
  "model_config": {
    "fg_mode": "bypass"
  },
  "storage": [
    {
      "mount_path": "/home/admin/docker_ml/workspace/model/",
      "oss": {
        "path": "oss://xxx/xxx/export",
        "readOnly": false
      },
      "properties": {
        "resource_type": "code"
      }
    }
  ],
  "processor":"easyrec-torch-1.12"
}

For other parameters, see JSON deployment.

Parameter	Required	Description	Example
processor	Yes	The TorchEasyRec Processor.	"processor":"easyrec-torch-1.12"
path	Yes	The OSS path mounted to the service to store model files.	"path": "oss://examplebucket/xxx/export"
fg_mode	No	Specifies the feature engineering mode. Valid values: bypass (default): Disables feature engineering (FG). Only the Torch model is deployed. Use this mode for custom feature processing. In this mode, the Processor does not require FeatureStore access parameters. normal: Enables FG. This mode is typically used with TorchEasyRec for model training.	"fg_mode": "normal"
fg_threads	No	The number of concurrent threads for FG execution per request.	"fg_threads": 15
outputs	No	The names of the output variables from the Torch model prediction, such as `probs_ctr`. Use commas (,) to separate multiple names. If this parameter is not specified, the service returns all variables.	"outputs":"probs_ctr,probs_cvr"
item_empty_score	No	The default score to return when an item ID does not exist. Default value: 0.	"item_empty_score": -1
Processor vector recall parameters
faiss_neigh_num	No	The number of items to retrieve for FAISS vector recall. The service prioritizes the value from the `faiss_neigh_num` field in the request body. If this field is not provided, it uses the value from the `faiss_neigh_num` parameter in the `model_config` section. The default value is 1.	"faiss_neigh_num": 200
faiss_nprobe	No	The `nprobe` parameter specifies the number of clusters to search during retrieval. In FAISS, an inverted file index divides data into smaller clusters and maintains an inverted list for each. A larger `nprobe` value generally improves recall accuracy at the cost of increased computation and search time, while a smaller value reduces accuracy but speeds up the search. The default value is 800.	"faiss_nprobe" : 700
Processor parameters for FeatureStore access
fs_project	No	The name of your FeatureStore project. This parameter is required when using FeatureStore. For more information, see Configure a FeatureStore project.	"fs_project": "fs_demo"
fs_model	No	The name of the feature model in FeatureStore.	"fs_model": "fs_rank_v1"
fs_entity	No	The entity name in FeatureStore.	"fs_entity": "item"
region	No	The region where your FeatureStore project is located. For example, specify `cn-beijing` for the China (Beijing) region. For a list of supported regions and their endpoints, see Endpoints.	"region": "cn-beijing"
access_key_id	No	The AccessKey ID for accessing FeatureStore.	"access_key_id": "xxxxx"
access_key_secret	No	The AccessKey Secret for accessing FeatureStore.	"access_key_secret": "xxxxx"
load_feature_from_offlinestore	No	Specifies whether to load offline features directly from the FeatureStore OfflineStore. Valid values: True: Loads data from the FeatureStore OfflineStore. False (default): Loads data from the FeatureStore OnlineStore.	"load_feature_from_offlinestore": True
featuredb_username	No	The username for FeatureDB.	"featuredb_username":"xxx"
featuredb_password	No	The password for FeatureDB.	"featuredb_passwd":"xxx"
Parameters for automatic feature expansion (input_tile)
INPUT_TILE	No	Enables automatic feature expansion to optimize performance. For features that share the same value across all items in a single request, such as a `user_id`, you can send the value only once. This helps reduce the request payload size, network latency, and computation time. This feature must be used in `normal` mode and with a model trained with TorchEasyRec. The corresponding environment variable must also be set during model export. By default, the system reads the `INPUT_TILE` value from the `model_acc.json` file in the exported model directory. If this file is missing, the system reads the value from the environment variable. When this feature is enabled: If the environment variable is set to 2, the FG for user-side features is calculated only once. If the environment variable is set to 3, the FG for user-side features is calculated only once. The system calculates embeddings for user and item features separately, and the user-side embedding is calculated only once. This setting is ideal for scenarios with many user-side features.	"processor_envs": [ { "name": "INPUT_TILE", "value": "2" } ]
NO_GRAD_GUARD	No	Disables gradient calculation during inference. This stops operation tracking and prevents the construction of a computation graph. Note Setting this parameter to `1` may cause incompatibility issues with some models. If the service hangs during a second inference run, you can resolve the issue by setting the `PYTORCH_TENSOREXPR_FALLBACK=2` environment variable. This bypasses the compilation step while retaining some graph optimization capabilities.	"processor_envs": [ { "name": "NO_GRAD_GUARD", "value": "1" } ]
Model warm-up parameters
warmup_data_path	No	Enables the model warm-up feature and specifies the path to save warm-up files. To persist the warm-up files, you must mount an OSS path to this location in the `storage` configuration.	"warmup_data_path": "/warmup"
warmup_cnt_per_file	No	The number of times to run the warm-up process for each Protobuf file. A higher value ensures a more thorough warm-up but increases the startup time. Default value: 20.	"warmup_cnt_per_file": 20,
warmup_pb_files_count	No	The number of online requests to save as Protobuf files for the next service startup. The files are used for model warm-up and are saved to the path specified by `warmup_data_path`. Default value: 64.	"warmup_pb_files_count": 64
Slow request logging and saving
long_request_threshold	No	The time threshold in milliseconds (ms) for identifying a slow request. If a request's processing time exceeds this threshold, the system automatically logs the execution time of each stage. Default value: 200.	"long_request_threshold": 200
save_long_request	No	Specifies whether to save requests that exceed the `long_request_threshold` as Protobuf files. If set to `true`, the files are saved to the `torch_req` directory under the model directory. Default value: `false`.	"save_long_request": true
Saving raw requests and item features to OSS
request_log_path	No	The disk path for saving the Protobuf files. You must mount an OSS path to this location in the service configuration.	"request_log_path": "/online_log_pb"
background_feature_thread_num	No	The number of background threads dedicated to writing files to disk. If the disk-writing workload is heavy, you can increase this value to speed up the saving process. Default value: 4.	"background_feature_thread_num": 8
Pass-through data configuration
pass_through_data	No	Specifies data to pass through to the response. This is useful for passing information to downstream services. The value must be a JSON object.	"pass_through_data": {"model_version": "20260513"}

Deploy the TorchEasyRec model service using one of the following methods:
JSON (Recommended)
Follow these steps:
1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Custom Model Deployment section, click JSON Deployment.
3. In the JSON editor, paste your JSON configuration, and then click Deploy.
eascmd CLI
1. Download and authenticate the client. The following steps use the 64-bit Windows version as an example.
2. From the directory containing your JSON file, run the following command to create a service. For more information about the command, see Command reference.
```
eascmdwin64.exe create <service.json>
```
  Replace <service.json> with the name of your JSON file, for example, torcheasyrec.json.

Step 2: Call a service

After deploying the TorchEasyRec model service, follow these steps to view the service call information:

Log in to the PAI console, select the region at the top of the page and the workspace on the right, and then click Go to EAS.
Click Invocation Information in the Service Type column for the target service to see the service endpoint and token.

The TorchEasyRec model service uses Protobuf as its input/output format. There are two calling methods, depending on whether FG is enabled:

Use FG (fg_mode='normal')

You can call the service using one of the following methods:

EAS Java SDK

Before you run the code, configure your Maven environment. For more information, see Java SDK usage instructions. For the latest version of the Java SDK, see https://github.com/pai-eas/eas-java-sdk. The following code shows how to send a request to the alirec_rank_with_fg service.

package com.aliyun.openservices.eas.predict;

import com.aliyun.openservices.eas.predict.http.Compressor;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.proto.TorchRecPredictProtos;
import com.aliyun.openservices.eas.predict.request.TorchRecRequest;
import com.aliyun.openservices.eas.predict.proto.TorchPredictProtos.ArrayProto;

import java.util.*;


public class TorchRecPredictTest {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRecRequest buildPredictRequest() {
        TorchRecRequest TorchRecRequest = new TorchRecRequest();
        TorchRecRequest.appendItemId("7033");

        TorchRecRequest.addUserFeature("user_id", 33981,"int");

        ArrayList<Double> list = new ArrayList<>();
        list.add(0.24689289764507472);
        list.add(0.005758482924454689);
        list.add(0.6765301324940026);
        list.add(0.18137273055602343);
        TorchRecRequest.addUserFeature("raw_3", list,"List<double>");

        Map<String,Integer> myMap =new LinkedHashMap<>();
        myMap.put("866", 4143);
        myMap.put("1627", 2451);
        TorchRecRequest.addUserFeature("map_1", myMap,"map<string,int>");

        ArrayList<ArrayList<Float>> list2 = new ArrayList<>();
        ArrayList<Float> innerList1 = new ArrayList<>();
        innerList1.add(1.1f);
        innerList1.add(2.2f);
        innerList1.add(3.3f);
        list2.add(innerList1);
        ArrayList<Float> innerList2 = new ArrayList<>();
        innerList2.add(4.4f);
        innerList2.add(5.5f);
        list2.add(innerList2);
        TorchRecRequest.addUserFeature("click", list2,"list<list<float>>");

        TorchRecRequest.addContextFeature("id_2", list,"List<double>");
        TorchRecRequest.addContextFeature("id_2", list,"List<double>");

        System.out.println(TorchRecRequest.request);
        return TorchRecRequest;
    }

    public static void main(String[] args) throws Exception{
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_with_fg");
        client.setRequestTimeout(100000);


        testInvoke(client);
        testDebugLevel(client);
        client.shutdown();
    }

    public static void testInvoke(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecPredictProtos.PBResponse response = client.predict(buildPredictRequest());
        for (Map.Entry<String, ArrayProto> entry : response.getMapOutputsMap().entrySet()) {

            System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }

    public static void testDebugLevel(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecRequest request = buildPredictRequest();
        request.setDebugLevel(1);
        TorchRecPredictProtos.PBResponse response = client.predict(request);
        Map<String, String> genFeas = response.getGenerateFeaturesMap();
        for(String itemId: genFeas.keySet()) {
            System.out.println(itemId);
            System.out.println(genFeas.get(itemId));
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }
}

The key parameters are:

client.setToken("tokenGeneratedFromService"): Replace the value in the parentheses with your service token. For example: MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the value in the parentheses with your service endpoint. For example: 175805416243****.cn-beijing.pai-eas.aliyuncs.com.
client.setModelName("alirec_rank_with_fg"): Replace the value in the parentheses with your service name.

EAS Python SDK

Before you run the code, install or update the eas-prediction library by running the pip install -U eas-prediction --user command. For more configuration details, see Using the Python SDK. You can find the source code at https://github.com/pai-eas/eas-python-sdk/blob/master/eas_prediction/torchrec_request.py. The following is the sample code:

from eas_prediction import PredictClient
from eas_prediction.torchrec_request import TorchRecRequest


if __name__ == '__main__':
    endpoint = 'http://localhost:6016'

    client = PredictClient(endpoint, '<YOUR_SERVICE_NAME>')
    client.set_token('<your_service_token>')
    client.init()
    torchrec_req = TorchRecRequest()

    torchrec_req.add_user_fea('user_id', 'u001d', "STRING")
    torchrec_req.add_user_fea('age', 12, "INT")
    torchrec_req.add_user_fea('weight', 129.8, "FLOAT")
    torchrec_req.add_item_id('item_0001')
    torchrec_req.add_item_id('item_0002')
    torchrec_req.add_item_id('item_0003')
    torchrec_req.add_user_fea("raw_3", [0.24689289764507472, 0.005758482924454689, 0.6765301324940026, 0.18137273055602343], "list<double>")
    torchrec_req.add_user_fea("raw_4", [0.9965264740966043, 0.659596586238391, 0.16396649403055896, 0.08364986620265635], "list<double>")
    torchrec_req.add_user_fea("map_1", {"0":0.37845234405201145}, "map<int,float>")
    torchrec_req.add_user_fea("map_2", {"866":4143,"1627":2451}, "map<int,int>")
    torchrec_req.add_context_fea("id_2", [866], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_user_fea("click", [[0.94433516,0.49145547], [0.94433516, 0.49145597]], "list<list<float>>")

    res = client.predict(torchrec_req)
    print(res)

The key settings are:

endpoint: Set this parameter to your service endpoint. For example: http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.
<your_service_name>: Replace this placeholder with your service name.
<your_service_token>: Replace this placeholder with your service token. For example: MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

Without FG (fg_mode='bypass')

EAS Java SDK

Before you run the code, configure your Maven environment. For more information, see Use the Java SDK. To get the latest SDK version, see the project on GitHub. The following example shows how to send a request to the alirec_rank_no_fg service:

package com.aliyun.openservices.eas.predict;

import java.util.List;
import java.util.Arrays;


import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.request.TorchDataType;
import com.aliyun.openservices.eas.predict.request.TorchRequest;
import com.aliyun.openservices.eas.predict.response.TorchResponse;

public class Test_Torch {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRequest buildPredictRequest() {
        TorchRequest request = new TorchRequest();
        float[] content = new float[2304000];
        for (int i = 0; i < content.length; i++) {
            content[i] = (float) 0.0;
        }
        long[] content_i = new long[900];
        for (int i = 0; i < content_i.length; i++) {
            content_i[i] = 0;
        }

        long[] a = Arrays.copyOfRange(content_i, 0, 300);
        float[] b = Arrays.copyOfRange(content, 0, 230400);
        request.addFeed(0, TorchDataType.DT_INT64, new long[]{300,3}, content_i);
        request.addFeed(1, TorchDataType.DT_FLOAT, new long[]{300,10,768}, content);
        request.addFeed(2, TorchDataType.DT_FLOAT, new long[]{300,768}, b);
        request.addFeed(3, TorchDataType.DT_INT64, new long[]{300}, a);
        request.addFetch(0);
        request.setDebugLevel(903);
        return request;
    }

    public static void main(String[] args) throws Exception {
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_no_fg");
        client.setIsCompressed(false);
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < 10; i++) {
            TorchResponse response = null;
            try {
                response = client.predict(buildPredictRequest());
                List<Float> result = response.getFloatVals(0);
                System.out.print("Predict Result: [");
                for (int j = 0; j < result.size(); j++) {
                    System.out.print(result.get(j).floatValue());
                    if (j != result.size() - 1) {
                        System.out.print(", ");
                    }
                }
                System.out.print("]\n");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");
        client.shutdown();
    }
}

The key parameters are:

client.setToken("tokenGeneratedFromService"): Replace the placeholder value with your service token. For example: MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.
client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the placeholder value with your service endpoint. For example: 175805416243****.cn-beijing.pai-eas.aliyuncs.com.
client.setModelName("alirec_rank_no_fg"): Replace the placeholder value with your service name.

EAS Python SDK

Before you run the code, run pip install -U eas-prediction --user to install or update the eas-prediction library. For more information, see Use the Python SDK. The following example shows how to send a request to the alirec_rank_no_fg service:

from eas_prediction import PredictClient
from eas_prediction import TorchRequest

# snappy data
req = TorchRequest(False)

req.add_feed(0, [300, 3], TorchRequest.DT_INT64, [1] * 900)
req.add_feed(1, [300, 10, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 768000)
req.add_feed(2, [300, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 76800)
req.add_feed(3, [300], TorchRequest.DT_INT64, [1] * 300)


client = PredictClient('<your_endpoint>', '<your_service_name>')
client.set_token('<your_service_token>')

client.init()

resp = client.predict(req)
print(resp)

The key settings are:

<your_endpoint>: Replace this placeholder with your service endpoint. For example: http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.
<your_service_name>: Replace this placeholder with your service name.
<your_service_token>: Replace this placeholder with your service token. For example: MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

For details about service status codes, see Service status codes. To build a service request, see also the Request format.

Request format

To call the service, you can generate prediction request code from the .proto file. Alternatively, to build the request yourself, use the following Protobuf definitions:

pytorch_predict.proto: Torch models

syntax = "proto3";

package pytorch.eas;
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchPredictProtos";

enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate that a DataType field
  // has not been set.
  DT_INVALID = 0;
  
  // Data types that all computation devices are expected to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits. Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
}

// Dimensions of an array.
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}

// Represents an array.
message ArrayProto {
  // Data type.
  ArrayDataType dtype = 1;

  // Array shape.
  ArrayShape array_shape = 2;

  // DT_FLOAT.
  repeated float float_val = 3 [packed = true];

  // DT_DOUBLE.
  repeated double double_val = 4 [packed = true];

  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];

  // DT_STRING.
  repeated bytes string_val = 6;

  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];

}


message PredictRequest {

  // Input tensors.
  repeated ArrayProto inputs = 1;

  // Output filter.
  repeated int32 output_filter = 2;

  // Input tensors for the recommendation model.
  map<string, ArrayProto> map_inputs = 3;

  // Debug level for the recommendation model.
  int32 debug_level = 100;
}

// Response for a successful PredictRequest.
message PredictResponse {
  // Output tensors.
  repeated ArrayProto outputs = 1;
  // Output tensors from the recommendation model.
  map<string, ArrayProto> map_outputs = 2;
}

torchrec_predict.proto: Torch model with FG

syntax = "proto3";

option go_package = ".;torch_predict_protos";
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchRecPredictProtos";
package com.alibaba.pairec.processor;
import "pytorch_predict.proto";

// Mappings from int64.
message LongStringMap {
  map<int64, string> map_field = 1;
}
message LongIntMap {
  map<int64, int32> map_field = 1;
}
message LongLongMap {
  map<int64, int64> map_field = 1;
}
message LongFloatMap {
  map<int64, float> map_field = 1;
}
message LongDoubleMap {
  map<int64, double> map_field = 1;
}

// Mappings from string.
message StringStringMap {
  map<string, string> map_field = 1;
}
message StringIntMap {
  map<string, int32> map_field = 1;
}
message StringLongMap {
  map<string, int64> map_field = 1;
}
message StringFloatMap {
  map<string, float> map_field = 1;
}
message StringDoubleMap {
  map<string, double> map_field = 1;
}

// Mappings from int32.
message IntStringMap {
  map<int32, string> map_field = 1;
}
message IntIntMap {
  map<int32, int32> map_field = 1;
}
message IntLongMap {
  map<int32, int64> map_field = 1;
}
message IntFloatMap {
  map<int32, float> map_field = 1;
}
message IntDoubleMap {
  map<int32, double> map_field = 1;
}

// Single-level lists.
message IntList {
  repeated int32 features = 1;
}
message LongList {
  repeated int64 features  = 1;
}

message FloatList {
  repeated float features = 1;
}
message DoubleList {
  repeated double features = 1;
}
message StringList {
  repeated string features = 1;
}

// Nested lists.
message IntLists {
  repeated IntList lists = 1;
}
message LongLists {
  repeated LongList lists = 1;
}

message FloatLists {
  repeated FloatList lists = 1;
}
message DoubleLists {
  repeated DoubleList lists = 1;
}
message StringLists {
  repeated StringList lists = 1;
}

message PBFeature {
  oneof value {
    int32 int_feature = 1;
    int64 long_feature = 2;
    string string_feature = 3;
    float float_feature = 4;
    double double_feature=5;

    LongStringMap long_string_map = 6; 
    LongIntMap long_int_map = 7; 
    LongLongMap long_long_map = 8; 
    LongFloatMap long_float_map = 9; 
    LongDoubleMap long_double_map = 10; 
    
    StringStringMap string_string_map = 11; 
    StringIntMap string_int_map = 12; 
    StringLongMap string_long_map = 13; 
    StringFloatMap string_float_map = 14; 
    StringDoubleMap string_double_map = 15; 

    IntStringMap int_string_map = 16; 
    IntIntMap int_int_map = 17; 
    IntLongMap int_long_map = 18; 
    IntFloatMap int_float_map = 19; 
    IntDoubleMap int_double_map = 20; 

    IntList int_list = 21; 
    LongList long_list =22;
    StringList string_list = 23;
    FloatList float_list = 24;
    DoubleList double_list = 25;

    IntLists int_lists = 26;
    LongLists long_lists =27;
    StringLists string_lists = 28;
    FloatLists float_lists = 29;
    DoubleLists double_lists = 30;
    
  }
}

// Context features.
message ContextFeatures {
  repeated PBFeature features = 1;
}

// Defines the request sent to the aggregator.
message PBRequest {
  // Debug level.
  int32 debug_level = 1;

  // User features, keyed by the input name.
  map<string, PBFeature> user_features = 2;

  // Item IDs.
  repeated string item_ids = 3;

  // Context features for each item, keyed by the input name. 
  map<string, ContextFeatures> context_features = 4;

  // The number of nearest neighbors to retrieve from Faiss.
  int32 faiss_neigh_num = 5;

  // Item features for each item, keyed by the input name. 
  map<string, ContextFeatures> item_features = 6;
  
  // Optional metadata.
  map<string, string> meta_data = 7;
}

// Defines the response from the aggregator.
message PBResponse {
  // Torch output tensors.
  map<string, pytorch.eas.ArrayProto> map_outputs = 1;

  // Output features from the feature generator (FG).
  map<string, string> generate_features = 2;

  // All input features for the feature generator (FG).
  map<string, string> raw_features = 3;

  // Item IDs.
  repeated string item_ids = 4;

  // Pass-through data configured in the model.
  map<string, string> pass_through_data = 5;
}

The debug_level parameter is as follows:

Note

Configuration is only required for debugging.

Value	Description
0	Performs a standard prediction.
1	Validates the `request key`, performs `shape validation` on `FG` inputs and outputs, and returns the `input features` and `output features` without performing a prediction.
2	Validates the `request key`, performs `shape validation` on `FG` inputs and outputs, returns the `input features`, `output features`, and the model input `tensor`, and then performs a prediction.
3	Validates the `request key`, performs `shape validation` on `FG` inputs and outputs, and returns the `output features` without performing a prediction.
100	Saves the prediction request—including the original request, `input features`, `output features`, and the `tensor` sent to the `model`—to a `Protobuf file` at the path specified by the `request_log_path` parameter.
102	Performs `vector recall`, validates the `request key`, performs `shape validation` on `FG` inputs and outputs, and saves the `input features`, `output features`, the model input `tensor`, and the `user embedding` result.
903	Logs the `prediction time` for each stage.
904	Checks the request for missing `feature fields` and records them in the `log`.

Service status codes

This section describes the main status codes for the TorchEasyRec service. For EAS service status codes, see Appendix: Service status codes and common errors.

Status code	Description
200	The request was successful.
400	Invalid request.
500	Prediction failed. Check the service log for more information.

Save and parse a Protobuf request

In processor version 1.12 or later, you can enable debug mode by setting debug=True in the PAI-REC engine request body. This saves the original request and item-side input and output features to a protobuf file for analysis and validation. To use this feature, set the request_log_path parameter to the destination path and mount OSS to that path. For example:

"model_config": {
        "fg_mode": "normal",
        "fg_threads": 8,
        "request_log_path": "/request_log",
        "background_feature_thread_num": 8
},
 "storage": [
    {
        "mount_path": "/request_log",
        "oss": {
            "path": "oss://my-bucket/my-model/myrequests/",
            "readOnly": false
        }
    },
    {
        "mount_path": "/home/admin/docker_ml/workspace/model/",
        "oss": {
            "path": "oss://my-bucket/my-model/20260316",
            "readOnly": false
        }
    }
]

The processor creates a date_hour subdirectory in the path specified by request_log_path to save request data. Background threads write this data to disk asynchronously. Use the model_config.background_feature_thread_num parameter to configure the number of background threads. The default is 4, and increasing it can improve write throughput. Protobuf files written to disk use the format <request_id>_<random_str>.pb. Due to the limited write bandwidth of OSS, do not enable debug mode for high request volumes in the PAI-REC engine. If disk writes fall behind, the model service's internal queue discards new requests.

To parse the protobuf file, you must use EAS-Python-SDK 0.35 or later, or EAS-Java-SDK 2.0.29 or later. The following example shows how to do this in Python:

from eas_prediction.torchrec_predict_pb2 import PBLogData
with open('xxxx.pb', 'rb') as f:
    pb_data = f.read() 
pb_log = PBLogData()
pb_log.ParseFromString(pb_data)
print(pb_log) # Print the entire log

print(pb_log.request) # Print the request
print(pb_log.raw_features) # Print the raw item-side features
print(pb_log.generate_features) # Print the item-side generated features

Model service warm-up

When a model service starts or is updated, it may experience response time spikes. To mitigate these spikes, configure the warm-up feature for the processor. For example, in easyrec-torch-1.5 and later versions, you can enable this feature by adding three parameters to model_config.

"warmup_data_path": "/warmup",  # Enables warmup and sets the path for the warmup files.
"warmup_cnt_per_file": 20, # Number of warmup iterations per file. A higher value results in a more thorough warmup.
"warmup_pb_files_count": 64 # Number of online requests to save as protobuf files for warmup. A higher value helps cover more data patterns.

To persist protobuf files, configure an OSS mount in the storage section at the warmup_data_path. For example:

"storage": [
    ...,
    {
        "mount_path": "/warmup",
        "oss": {
            "path": "oss://<warmup Protobuf file path>",
            "readOnly": false
        }
    }
]

On its first startup after configuration, the processor captures and saves the number of online requests specified by warmup_pb_files_count. On subsequent restarts, it uses these saved protobuf files to warm up.

Background

Limitations

Version history

Step 1: Deploy a service

Example with FG (fg_mode='normal')

Example without FG (fg_mode='bypass')

JSON (Recommended)

eascmd CLI

Step 2: Call a service

Use FG (fg_mode='normal')

EAS Java SDK

EAS Python SDK

Without FG (fg_mode='bypass')

EAS Java SDK

EAS Python SDK

Request format

pytorch_predict.proto: Torch models

torchrec_predict.proto: Torch model with FG

Service status codes

Save and parse a Protobuf request

Model service warm-up