All Products
Search
Document Center

Platform For AI:TorchEasyRec Processor

Last Updated:May 14, 2026

The built-in TorchEasyRec Processor in EAS lets you deploy recommendation models trained with TorchEasyRec or PyTorch as a scoring service with integrated feature engineering. By jointly optimizing feature engineering and the PyTorch model, the processor provides a high-performance scoring service. This topic describes how to deploy and invoke a TorchEasyRec model service.

Background

The following diagram shows the architecture of a recommendation engine based on TorchEasyRec Processor:

image

The TorchEasyRec Processor includes the following modules:

  • Item Feature Cache: Caches item features from FeatureStore in memory to reduce network overhead, lower the load on FeatureStore, and improve inference service performance. If these item features include real-time features, FeatureStore handles their synchronization.

  • feature generator (FG): Processes feature transformations based on a configuration file. A single C++ codebase ensures consistent logic for both offline and online processing.

  • TorchModel: A PyTorch model, trained with TorchEasyRec or PyTorch and exported as a ScriptedModel.

Limitations

This feature supports only general-purpose instance families g6, g7, and g8, and GPU models such as T4 and A10. For more information, see general-purpose instance families (g series). If you deploy a GPU service, ensure that the CUDA Driver version is 535 or later.

Version history

The TorchEasyRec Processor is in active development. We recommend using the latest version to deploy your inference service, as newer versions provide more features and higher performance. A list of published versions is provided below:

Processor name

Release date

Torch version

FG version

Updates

easyrec-torch-0.1

2024-09-10

2.4

0.2.9

  • Supports Feature Generator (FG) and FeatureStore Item Feature Cache.

  • Supports CPU and GPU inference for Torch models.

  • Supports automatic expansion of Input_Tile user features.

  • Supports vector recall using Faiss.

  • Supports warm-up in normal mode.

easyrec-torch-0.2

2024-09-30

2.4

0.2.9

  • FeatureDB now supports complex types.

  • Accelerated data loading for FeatureStore initialization.

  • Optimized the debug_level in bypass mode.

  • Optimized host-to-device (H2D) data transfer.

easyrec-torch-0.3

2024-10-14

2.4

0.2.9

  • Supports JSON-based initialization in FeatureStore.

  • Redefined Protobuf definitions.

easyrec-torch-0.4

2024-10-28

2.4

0.3.1

  • Fixed an issue with complex types in Feature Generator (FG).

easyrec-torch-0.5

2024-11-14

2.4

0.3.1

  • Optimized online and offline consistency. When debug mode is enabled, the processor generates feature information after FG processing, regardless of whether the item exists.

easyrec-torch-0.6

2024-11-18

2.4

0.3.6

  • Optimized the packaging process by removing redundant header files.

easyrec-torch-0.7

2024-12-06

2.5

0.3.9

  • Supports array types for sequence primary keys.

  • Upgraded Torch to version 2.5.

  • Upgraded FG to version 0.3.9.

easyrec-torch-0.8

2024-12-25

2.5

0.3.9

  • Upgraded the TensorRT SDK to version 2.5.

  • Supports the int64 data type for model inputs.

  • Upgraded the FeatureStore version to resolve an issue with feature queries in Hologres.

  • Optimized runtime efficiency and logic for debugging.

  • Added item_features to the Protobuf definition to support passing item features in the request.

easyrec-torch-0.9

2025-01-15

2.5

0.4.1

  • Upgraded Feature Generator (FG) to version 0.4.1 to optimize initialization time in multi-threaded environments.

easyrec-torch-1.0

2025-02-06

2.5

0.4.2

  • Supports weighted features.

  • Upgraded Feature Generator (FG) to version 0.4.2.

  • Supports AMD CPUs.

easyrec-torch-1.1

2025-04-23

2.5

0.5.9

  • Upgraded the FeatureStore SDK. This version adds high-speed connectivity to FeatureDB over a VPC network and supports filtering expired real-time features in memory based on event_time and TTL.

  • Upgraded Feature Generator (FG). This version adds support for custom sequence features and fixes issues related to combo features.

easyrec-torch-1.2

2025-05-12

2.5

0.6.0

  • Upgraded FG to version 0.6.0.

  • Supports reading features from multiple FeatureStore entities, for example, config["fs_entity"] = "item,raw";.

  • In debug mode, the processor now outputs item IDs from the request that are not found in FeatureStore.

easyrec-torch-1.3

2025-05-29

2.5

0.6.5

  • Upgraded FG to version 0.6.5.

  • Added FSMAP support for weighted ID features.

  • Supports WordPiece tokenization.

  • Added a boolean_mask filter operator.

  • Enhanced the expression feature operator.

easyrec-torch-1.4

2025-07-15

2.5

0.6.9

  • Upgraded FG to version 0.6.9.

  • Added several new functions to the expression feature operator.

  • Moved the debug string generation logic from the processor to the FG library.

easyrec-torch-1.5

2025-09-18

2.5

0.7.3

  • Upgraded FG to version 0.7.3.

  • Supports capturing online requests for model warm-up.

  • Upgraded the FeatureStore SDK to version 20250826. This version supports three-level table schemas in MaxCompute, zero-trust calls without an AccessKey, and is compatible with adding features to a feature view.

easyrec-torch-1.6

2025-10-21

2.5

0.7.4

  • Optimized log control to prevent excessive logging from degrading performance during high callback request volumes.

  • Optimized context feature processing.

  • Feature preprocessing and FG now share a thread pool to conserve thread resources.

  • Upgraded FG to version 0.7.4.

easyrec-torch-1.7

2025-11-04

2.5

0.7.4

  • Optimized the logic for saving debug tensors to prevent excessive file creation triggered by callbacks.

easyrec-torch-1.8

2025-12-01

2.5

0.7.4

  • Optimized the FeatureStore SDK thread pool to prevent thread creation failures under high resource pressure.

  • Upgraded the FS SDK to version 20251117.

easyrec-torch-1.9

2026-01-09

2.5

1.0.0

  • Enabled CUDA multi-stream for GPU inference to improve system throughput and performance.

  • Upgraded FG to version 1.0.0.

easyrec-torch-1.10

2026-01-23

2.5

1.0.1

  • Enabled automatic logging of execution time for slow requests.

  • Added a configuration parameter to save request data when a slow request is detected.

easyrec-torch-1.11

2026-02-10

2.5

1.0.1

  • Fixed a memory contiguity issue with output tensors in specific scenarios.

  • Upgraded the FS SDK to version 20260202.

easyrec-torch-1.12

2026-03-13

2.5

1.0.1

  • When debug mode is enabled for a PAI-Rec engine request, the model service asynchronously saves the original request and item-side features (both before and after FG processing) to a disk in Protobuf format. You can specify the save path by using the request_log_path parameter and mount an OSS bucket to this path at startup.

  • Upgraded the FS SDK to version 20260305.

Notes on version 2.0 and later

easyrec-torch-2.0 includes an upgraded GLIBC version in the EAS backend base image. Therefore, when you deploy version 2.0 or later of the processor:

  1. If you are creating a new EAS service, follow the standard deployment procedure. The process is identical to that for deploying versions 0.x and 1.x.

  2. If you are upgrading an existing EAS service that was created before March 15, 2026, you must contact an Alibaba Cloud technical expert to upgrade the backend base image of your service before you upgrade the processor. Otherwise, the deployment may fail due to an incompatible runtime environment.

easyrec-torch-2.0

2026-03-17

2.8

1.0.1

  • Upgraded the PyTorch runtime to 2.8.

  • Upgraded the CUDA runtime to 12.6.

  • Upgraded the FBGEMM_GPU runtime to 1.3.

  • Upgraded the GLIBC version in the base image to 2.38.

easyrec-torch-2.1

2026-04-09

2.8

1.0.2

  • Fixed online and offline consistency issues caused by missing feature values.

  • Set the default value of the fg_threads parameter to the number of logical CPU cores.

  • Supports capturing performance logs using the Kineto profiler.

  • Upgraded the FS SDK to version 20260402.

easyrec-torch-2.2

2026-04-29

2.8

1.0.5

  • Supports DLRM-HSTU inference.

  • Fixed a deployment error that occurred in CPU-only environments for versions 2.0 and 2.1.

  • Upgraded the FS SDK to version 20260416.

Step 1: Deploy a service

  1. Prepare the service configuration file torcheasyrec.json.

    Set the processor type to easyrec-torch-{version}. For {version}, select a version from the version list. The following sections show example JSON configurations:

    Example with FG (fg_mode='normal')

    {
      "metadata": {
        "instance": 1,
        "name": "alirec_rank_with_fg",
        "rpc": {
          "enable_jemalloc": 1,
          "max_queue_size": 256,
          "worker_threads": 16
        }
      },
      "cloud": {
            "computing": {
                "instance_type": "ecs.gn6i-c16g1.4xlarge"
            }
      },
      "model_config": {
        "fg_mode": "normal",
        "fg_threads": 8,
        "region": "YOUR_REGION",
        "fs_project": "YOUR_FS_PROJECT",
        "fs_model": "YOUR_FS_MODEL",
        "fs_entity": "item",
        "load_feature_from_offlinestore": true,
        "access_key_id":"YOUR_ACCESS_KEY_ID",
        "access_key_secret":"YOUR_ACCESS_KEY_SECRET"
      },
      "storage": [
        {
          "mount_path": "/home/admin/docker_ml/workspace/model/",
          "oss": {
            "path": "oss://xxx/xxx/export",
            "readOnly": false
          },
          "properties": {
            "resource_type": "code"
          }
        }
      ],
      "processor":"easyrec-torch-1.12"
    }

    Example without FG (fg_mode='bypass')

    {
      "metadata": {
        "instance": 1,
        "name": "alirec_rank_no_fg",
        "rpc": {
          "enable_jemalloc": 1,
          "max_queue_size": 256,
          "worker_threads": 16
        }
      },
      "cloud": {
            "computing": {
                "instance_type": "ecs.gn6i-c16g1.4xlarge"
            }
      },
      "model_config": {
        "fg_mode": "bypass"
      },
      "storage": [
        {
          "mount_path": "/home/admin/docker_ml/workspace/model/",
          "oss": {
            "path": "oss://xxx/xxx/export",
            "readOnly": false
          },
          "properties": {
            "resource_type": "code"
          }
        }
      ],
      "processor":"easyrec-torch-1.12"
    }

    The following table describes the key parameters. For other parameters, see JSON Deployment.

    Parameter

    Required

    Description

    Example

    processor

    Yes

    The TorchEasyRec processor.

    "processor":"easyrec-torch-1.12"

    path

    Yes

    The OSS path mounted to the service to store model files.

    "path": "oss://examplebucket/xxx/export"

    fg_mode

    No

    Specifies the feature engineering (FG) mode. Valid values:

    • bypass (default): Disables FG. Only the Torch model is deployed.

      • This mode is suitable for custom feature processing scenarios.

      • In this mode, you do not need to configure parameters for the processor to access FeatureStore.

    • normal: Enables FG. This mode is typically used with TorchEasyRec for model training.

    "fg_mode": "normal"

    fg_threads

    No

    The number of concurrent FG threads per request.

    "fg_threads": 15

    outputs

    No

    The names of output variables from the Torch model, such as probs_ctr. Separate multiple names with a comma (,). If unspecified, all variables are returned.

    "outputs":"probs_ctr,probs_cvr"

    item_empty_score

    No

    The default score to return when an item ID does not exist. Default value: 0.

    "item_empty_score": -1

    Processor recall parameters

    faiss_neigh_num

    No

    The number of items to retrieve for FAISS vector recall. The value is taken from the faiss_neigh_num field in the request. If this field is not provided, the value of faiss_neigh_num in the model_config configuration is used, which defaults to 1.

    "faiss_neigh_num": 200

    faiss_nprobe

    No

    The nprobe parameter specifies the number of clusters to search during the retrieval process. The default value is 800. In FAISS, an inverted file index divides data into multiple small clusters and maintains an inverted list for each cluster. A larger nprobe value usually results in higher search accuracy but increases computational cost and search time. Conversely, a smaller value reduces accuracy but speeds up the search.

    "faiss_nprobe" : 700

    Processor parameters for FeatureStore access

    fs_project

    No

    The name of the FeatureStore project. Required when using FeatureStore. For more information, see Configure a FeatureStore project.

    "fs_project": "fs_demo"

    fs_model

    No

    The name of the feature model in FeatureStore.

    "fs_model": "fs_rank_v1"

    fs_entity

    No

    The name of the entity in FeatureStore.

    "fs_entity": "item"

    region

    No

    The region where the FeatureStore product is located. For example, specify cn-beijing for the China (Beijing) region. For more information about regions, see Endpoints.

    "region": "cn-beijing"

    access_key_id

    No

    The AccessKey ID for accessing FeatureStore.

    "access_key_id": "xxxxx"

    access_key_secret

    No

    The AccessKey Secret for accessing FeatureStore.

    "access_key_secret": "xxxxx"

    load_feature_from_offlinestore

    No

    Specifies whether to load offline features directly from the FeatureStore OfflineStore. Valid values:

    • true: Loads data from the FeatureStore OfflineStore.

    • false (default): Loads data from the FeatureStore OnlineStore.

    "load_feature_from_offlinestore": True

    featuredb_username

    No

    The username for FeatureDB.

    "featuredb_username":"xxx"

    featuredb_password

    No

    The password for FeatureDB.

    "featuredb_password":"xxx"

    input_tile: Automatic feature expansion

    INPUT_TILE

    No

    Enables automatic feature expansion. For features that have the same value across all items in a single request, such as user_id, you can send the value only once. This reduces the request size, network latency, and computation time.

    This feature requires normal mode and a model exported from TorchEasyRec with the corresponding environment variable set. By default, the system reads the INPUT_TILE value from the model_acc.json file in the exported model directory. If this file does not exist, the value is read from the environment variable.

    When enabled:

    • If set to 2: FG for user-side features is calculated only once.

    • If set to 3: FG for user-side features is calculated only once. The system calculates embeddings for user and item features separately, and the user-side embedding is also calculated only once. This is suitable for scenarios with a large number of user-side features.

    "processor_envs":

    [

    {

    "name": "INPUT_TILE",

    "value": "2"

    }

    ]

    NO_GRAD_GUARD

    No

    Disables gradient calculation during inference. This stops operation tracking and prevents the construction of a computation graph.

    Note

    Setting this to 1 might cause incompatibility issues with some models. If the service hangs during a second inference run, you can resolve the issue by setting the PYTORCH_TENSOREXPR_FALLBACK=2 environment variable. This bypasses the compilation step while retaining some graph optimization functionality.

    "processor_envs":

    [

    {

    "name": "NO_GRAD_GUARD",

    "value": "1"

    }

    ]

    Model warm-up parameters

    warmup_data_path

    No

    Enables the model warm-up feature and specifies the path to save warm-up files. To prevent these files from being lost, you must mount an OSS path to this location in the storage configuration.

    "warmup_data_path": "/warmup"

    warmup_cnt_per_file

    No

    The number of warm-up iterations to run for each warm-up Protobuf (PB) file. A larger value ensures a more thorough warm-up but increases the warm-up time. Default value: 20.

    "warmup_cnt_per_file": 20,

    warmup_pb_files_count

    No

    The number of online requests to save as PB files for warm-up at the next service startup. The files are saved to the path specified by warmup_data_path. Default value: 64.

    "warmup_pb_files_count": 64

    Slow request logging and saving

    long_request_threshold

    No

    The time threshold in milliseconds (ms) for identifying a slow request. If a request's processing time exceeds this threshold, the system automatically records the execution time of each stage in the logs. Default value: 200.

    "long_request_threshold": 200

    save_long_request

    No

    Specifies whether to save requests that exceed the long_request_threshold as PB files. The default value is false. The PB files are saved to the torch_req folder under the model directory.

    "save_long_request": true

    Request and feature logging to OSS

    request_log_path

    No

    The disk path where the PB files are saved. In the model service configuration, you must mount an OSS path to this location.

    "request_log_path": "/online_log_pb"

    background_feature_thread_num

    No

    The number of background threads for saving files to disk. If the disk-writing workload is heavy, you can increase this value to improve the throughput for saving PB files. Default value: 4.

    "background_feature_thread_num": 8

  2. Deploy the TorchEasyRec model service. You can use one of the following methods:

    JSON Deployment (Recommended)

    Follow these steps:

    1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

    2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Custom Model Deployment section, click JSON Deployment.

    3. In the JSON editor, paste your JSON configuration and click Deploy.

    eascmd client

    1. Download and authenticate the client. This example uses the 64-bit Windows version of the client.

    2. Open a terminal. In the directory containing the JSON file, run the following command to create the service. For more command information, see Command reference.

      eascmdwin64.exe create <service.json>

      Replace <service.json> with your JSON file name, such as torcheasyrec.json.

Step 2: Call service

After deploying the TorchEasyRec model service, follow these steps to view the service call information:

  1. Log in to the PAI console, select the region at the top of the page and the workspace on the right, then click Enter EAS.

  2. In the Service Type column, click Invocation Information to view the service endpoint and token.image

The TorchEasyRec model service uses Protobuf for input and output. The calling method depends on whether FG is enabled:

Use FG (fg_mode='normal')

The service supports the following two methods:

EAS Java SDK

Before you run the code, configure your Maven environment. For details, see Java SDK usage instructions. For the latest version of the EAS Java SDK, see https://github.com/pai-eas/eas-java-sdk. The following code shows how to send a request to the alirec_rank_with_fg service:

package com.aliyun.openservices.eas.predict;

import com.aliyun.openservices.eas.predict.http.Compressor;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.proto.TorchRecPredictProtos;
import com.aliyun.openservices.eas.predict.request.TorchRecRequest;
import com.aliyun.openservices.eas.predict.proto.TorchPredictProtos.ArrayProto;

import java.util.*;


public class TorchRecPredictTest {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRecRequest buildPredictRequest() {
        TorchRecRequest TorchRecRequest = new TorchRecRequest();
        TorchRecRequest.appendItemId("7033");

        TorchRecRequest.addUserFeature("user_id", 33981,"int");

        ArrayList<Double> list = new ArrayList<>();
        list.add(0.24689289764507472);
        list.add(0.005758482924454689);
        list.add(0.6765301324940026);
        list.add(0.18137273055602343);
        TorchRecRequest.addUserFeature("raw_3", list,"List<double>");

        Map<String,Integer> myMap =new LinkedHashMap<>();
        myMap.put("866", 4143);
        myMap.put("1627", 2451);
        TorchRecRequest.addUserFeature("map_1", myMap,"map<string,int>");

        ArrayList<ArrayList<Float>> list2 = new ArrayList<>();
        ArrayList<Float> innerList1 = new ArrayList<>();
        innerList1.add(1.1f);
        innerList1.add(2.2f);
        innerList1.add(3.3f);
        list2.add(innerList1);
        ArrayList<Float> innerList2 = new ArrayList<>();
        innerList2.add(4.4f);
        innerList2.add(5.5f);
        list2.add(innerList2);
        TorchRecRequest.addUserFeature("click", list2,"list<list<float>>");

        TorchRecRequest.addContextFeature("id_2", list,"List<double>");
        TorchRecRequest.addContextFeature("id_2", list,"List<double>");

        System.out.println(TorchRecRequest.request);
        return TorchRecRequest;
    }

    public static void main(String[] args) throws Exception{
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_with_fg");
        client.setRequestTimeout(100000);


        testInvoke(client);
        testDebugLevel(client);
        client.shutdown();
    }

    public static void testInvoke(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecPredictProtos.PBResponse response = client.predict(buildPredictRequest());
        for (Map.Entry<String, ArrayProto> entry : response.getMapOutputsMap().entrySet()) {

            System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }

    public static void testDebugLevel(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecRequest request = buildPredictRequest();
        request.setDebugLevel(1);
        TorchRecPredictProtos.PBResponse response = client.predict(request);
        Map<String, String> genFeas = response.getGenerateFeaturesMap();
        for(String itemId: genFeas.keySet()) {
            System.out.println(itemId);
            System.out.println(genFeas.get(itemId));
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }
}

The key parameters are as follows:

  • client.setToken("tokenGeneratedFromService"): Replace the value with your service token. For example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

  • client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the value with your service endpoint. For example, 175805416243****.cn-beijing.pai-eas.aliyuncs.com.

  • client.setModelName("alirec_rank_with_fg"): Replace the value with your service name.

EAS Python SDK

Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see Python SDK usage instructions. For the source code, see https://github.com/pai-eas/eas-python-sdk/blob/master/eas_prediction/torchrec_request.py. The following code is an example:

from eas_prediction import PredictClient
from eas_prediction.torchrec_request import TorchRecRequest


if __name__ == '__main__':
    endpoint = 'http://localhost:6016'

    client = PredictClient(endpoint, '<YOUR_SERVICE_NAME>')
    client.set_token('<your_service_token>')
    client.init()
    torchrec_req = TorchRecRequest()

    torchrec_req.add_user_fea('user_id', 'u001d', "STRING")
    torchrec_req.add_user_fea('age', 12, "INT")
    torchrec_req.add_user_fea('weight', 129.8, "FLOAT")
    torchrec_req.add_item_id('item_0001')
    torchrec_req.add_item_id('item_0002')
    torchrec_req.add_item_id('item_0003')
    torchrec_req.add_user_fea("raw_3", [0.24689289764507472, 0.005758482924454689, 0.6765301324940026, 0.18137273055602343], "list<double>")
    torchrec_req.add_user_fea("raw_4", [0.9965264740966043, 0.659596586238391, 0.16396649403055896, 0.08364986620265635], "list<double>")
    torchrec_req.add_user_fea("map_1", {"0":0.37845234405201145}, "map<int,float>")
    torchrec_req.add_user_fea("map_2", {"866":4143,"1627":2451}, "map<int,int>")
    torchrec_req.add_context_fea("id_2", [866], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_user_fea("click", [[0.94433516,0.49145547], [0.94433516, 0.49145597]], "list<list<float>>")

    res = client.predict(torchrec_req)
    print(res)

The key parameters are as follows:

  • endpoint: Set this parameter to your service endpoint. For example, http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.

  • <your_service_name>: Replace this placeholder with your service name.

  • <your_service_token>: Replace this placeholder with your service token. For example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

Bypassing FG (fg_mode='bypass')

EAS Java SDK

Before you run the code, configure your Maven environment. For details, see Java SDK Usage Instructions. For the latest SDK version, see the project on GitHub. The following sample code sends a request to the alirec_rank_no_fg service:

package com.aliyun.openservices.eas.predict;

import java.util.List;
import java.util.Arrays;


import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.request.TorchDataType;
import com.aliyun.openservices.eas.predict.request.TorchRequest;
import com.aliyun.openservices.eas.predict.response.TorchResponse;

public class Test_Torch {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRequest buildPredictRequest() {
        TorchRequest request = new TorchRequest();
        float[] content = new float[2304000];
        for (int i = 0; i < content.length; i++) {
            content[i] = (float) 0.0;
        }
        long[] content_i = new long[900];
        for (int i = 0; i < content_i.length; i++) {
            content_i[i] = 0;
        }

        long[] a = Arrays.copyOfRange(content_i, 0, 300);
        float[] b = Arrays.copyOfRange(content, 0, 230400);
        request.addFeed(0, TorchDataType.DT_INT64, new long[]{300,3}, content_i);
        request.addFeed(1, TorchDataType.DT_FLOAT, new long[]{300,10,768}, content);
        request.addFeed(2, TorchDataType.DT_FLOAT, new long[]{300,768}, b);
        request.addFeed(3, TorchDataType.DT_INT64, new long[]{300}, a);
        request.addFetch(0);
        request.setDebugLevel(903);
        return request;
    }

    public static void main(String[] args) throws Exception {
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_no_fg");
        client.setIsCompressed(false);
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < 10; i++) {
            TorchResponse response = null;
            try {
                response = client.predict(buildPredictRequest());
                List<Float> result = response.getFloatVals(0);
                System.out.print("Predict Result: [");
                for (int j = 0; j < result.size(); j++) {
                    System.out.print(result.get(j).floatValue());
                    if (j != result.size() - 1) {
                        System.out.print(", ");
                    }
                }
                System.out.print("]\n");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");
        client.shutdown();
    }
}

The key parameters are as follows:

  • client.setToken("tokenGeneratedFromService"): Replace the value in parentheses with your service token. For example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

  • client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the value in parentheses with your service endpoint. For example, 175805416243****.cn-beijing.pai-eas.aliyuncs.com.

  • client.setModelName("alirec_rank_no_fg"): Replace the value in parentheses with your service name.

EAS Python SDK

Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more configuration details, see Python SDK Instructions. The sample code for requesting the alirec_rank_no_fg service is as follows:

from eas_prediction import PredictClient
from eas_prediction import TorchRequest

# snappy data
req = TorchRequest(False)

req.add_feed(0, [300, 3], TorchRequest.DT_INT64, [1] * 900)
req.add_feed(1, [300, 10, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 768000)
req.add_feed(2, [300, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 76800)
req.add_feed(3, [300], TorchRequest.DT_INT64, [1] * 300)


client = PredictClient('<your_endpoint>', '<your_service_name>')
client.set_token('<your_service_token>')

client.init()

resp = client.predict(req)
print(resp)

The key parameters are as follows:

  • <your_endpoint>: Replace this placeholder with your service endpoint. For example, http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.

  • <your_service_name>: Replace this placeholder with your service name.

  • <your_service_token>: Replace this placeholder with your service token. For example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

For details on the status codes that the service returns, see Service Status Codes. To construct a service request, see the Request Format.

Request format

To call the service, you can manually generate the prediction request code from the .proto file. Alternatively, if you want to build a service request yourself, you can use the following Protobuf definition to generate the corresponding code:

Pytorch_predict.proto: Torch model request

syntax = "proto3";

package pytorch.eas;
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchPredictProtos";

enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;
  
  // Data types that all computation devices are expected to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
}

// Dimensions of an array.
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}

// Protocol buffer representing an array.
message ArrayProto {
  // Data type.
  ArrayDataType dtype = 1;

  // Shape of the array.
  ArrayShape array_shape = 2;

  // DT_FLOAT.
  repeated float float_val = 3 [packed = true];

  // DT_DOUBLE.
  repeated double double_val = 4 [packed = true];

  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];

  // DT_STRING.
  repeated bytes string_val = 6;

  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];

}


message PredictRequest {

  // Input tensors.
  repeated ArrayProto inputs = 1;

  // Output filter.
  repeated int32 output_filter = 2;

  // Input tensors for recommendation.
  map<string, ArrayProto> map_inputs = 3;

  // Debug level for recommendation.
  int32 debug_level = 100;
}

// Response to a successful PredictRequest.
message PredictResponse {
  // Output tensors.
  repeated ArrayProto outputs = 1;
  // Output tensors for recommendation.
  map<string, ArrayProto> map_outputs = 2;
}

Torchrec_predict.proto: Torch model with FG request

syntax = "proto3";

option go_package = ".;torch_predict_protos";
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchRecPredictProtos";
package com.alibaba.pairec.processor;
import "pytorch_predict.proto";

// Mappings from int64 to other types.
message LongStringMap {
  map<int64, string> map_field = 1;
}
message LongIntMap {
  map<int64, int32> map_field = 1;
}
message LongLongMap {
  map<int64, int64> map_field = 1;
}
message LongFloatMap {
  map<int64, float> map_field = 1;
}
message LongDoubleMap {
  map<int64, double> map_field = 1;
}

// Mappings from string to other types.
message StringStringMap {
  map<string, string> map_field = 1;
}
message StringIntMap {
  map<string, int32> map_field = 1;
}
message StringLongMap {
  map<string, int64> map_field = 1;
}
message StringFloatMap {
  map<string, float> map_field = 1;
}
message StringDoubleMap {
  map<string, double> map_field = 1;
}

// Mappings from int32 to other types.
message IntStringMap {
  map<int32, string> map_field = 1;
}
message IntIntMap {
  map<int32, int32> map_field = 1;
}
message IntLongMap {
  map<int32, int64> map_field = 1;
}
message IntFloatMap {
  map<int32, float> map_field = 1;
}
message IntDoubleMap {
  map<int32, double> map_field = 1;
}

// Single-level list types.
message IntList {
  repeated int32 features = 1;
}
message LongList {
  repeated int64 features  = 1;
}

message FloatList {
  repeated float features = 1;
}
message DoubleList {
  repeated double features = 1;
}
message StringList {
  repeated string features = 1;
}

// Nested list types.
message IntLists {
  repeated IntList lists = 1;
}
message LongLists {
  repeated LongList lists = 1;
}

message FloatLists {
  repeated FloatList lists = 1;
}
message DoubleLists {
  repeated DoubleList lists = 1;
}
message StringLists {
  repeated StringList lists = 1;
}

message PBFeature {
  oneof value {
    int32 int_feature = 1;
    int64 long_feature = 2;
    string string_feature = 3;
    float float_feature = 4;
    double double_feature=5;

    LongStringMap long_string_map = 6; 
    LongIntMap long_int_map = 7; 
    LongLongMap long_long_map = 8; 
    LongFloatMap long_float_map = 9; 
    LongDoubleMap long_double_map = 10; 
    
    StringStringMap string_string_map = 11; 
    StringIntMap string_int_map = 12; 
    StringLongMap string_long_map = 13; 
    StringFloatMap string_float_map = 14; 
    StringDoubleMap string_double_map = 15; 

    IntStringMap int_string_map = 16; 
    IntIntMap int_int_map = 17; 
    IntLongMap int_long_map = 18; 
    IntFloatMap int_float_map = 19; 
    IntDoubleMap int_double_map = 20; 

    IntList int_list = 21; 
    LongList long_list =22;
    StringList string_list = 23;
    FloatList float_list = 24;
    DoubleList double_list = 25;

    IntLists int_lists = 26;
    LongLists long_lists =27;
    StringLists string_lists = 28;
    FloatLists float_lists = 29;
    DoubleLists double_lists = 30;
    
  }
}

// Context features.
message ContextFeatures {
  repeated PBFeature features = 1;
}

// PBRequest specifies the request for the aggregator.
message PBRequest {
  // Debug mode.
  int32 debug_level = 1;

  // User features, keyed by the user input name.
  map<string, PBFeature> user_features = 2;

  // Item IDs.
  repeated string item_ids = 3;

  // Context features for each item, keyed by the context input name.
  map<string, ContextFeatures> context_features = 4;

  // Number of nearest neighbors (items) to retrieve from Faiss.
  int32 faiss_neigh_num = 5;

  // Item features for each item, keyed by the item input name.
  map<string, ContextFeatures> item_features = 6;
  
  // Optional metadata.
  map<string, string> meta_data = 7;
}

// PBResponse specifies the response from the aggregator.
message PBResponse {
  // Output tensors from the Torch model.
  map<string, pytorch.eas.ArrayProto> map_outputs = 1;

  // Output features generated by FG.
  map<string, string> generate_features = 2;

  // All input features processed by FG.
  map<string, string> raw_features = 3;

  // Item IDs.
  repeated string item_ids = 4;

}

The debug_level parameter is as follows:

Note

This parameter is optional and for debugging only.

Debug_level

Description

0

Performs a standard prediction.

1

In normal mode, validates request keys, performs shape validation on FG inputs and outputs, and returns the input and output features without performing a prediction.

2

In normal mode, validates request keys, performs shape validation on FG inputs and outputs, returns the input features, output features, and the model input tensor, and performs a prediction.

3

In normal mode, validates request keys, performs shape validation on FG inputs and outputs, and returns the output features without performing a prediction.

100

In normal mode, persists the prediction request to disk as a Protobuf file. The request_log_path parameter specifies the file path. The file contains the original request and the item side input and output features.

102

In normal mode, performs vector recall: validates request keys, performs shape validation on FG inputs and outputs, and saves the input features, output features, the model input tensor, and the user embedding.

903

Logs the prediction time for each stage.

904

Validates the request and logs any missing feature fields.

Service status codes

A TorchEasyRec service returns the following status codes. For more information about status codes returned by an EAS service, see Appendix: service status codes and common errors.

Status code

Description

200

The request was successful.

400

The request is invalid.

500

Prediction failed. Check the service log for details.

Save and parse a Protobuf request

For processor version 1.12 and later, when you enable debug mode by setting debug=True in the PAI-REC engine request body, the processor saves the original request, item-side input features, and transformed item-side features to a protobuf file on disk for feature analysis and validation. To use this feature, set the request_log_path parameter to a destination path mounted via OSS. For example:

"model_config": {
        "fg_mode": "normal",
        "fg_threads": 8,
        "request_log_path": "/request_log",
        "background_feature_thread_num": 8
},
 "storage": [
    {
        "mount_path": "/request_log",
        "OSS": {
            "path": "oss://my-bucket/my-model/myrequests/",
            "readOnly": false
        }
    },
    {
        "mount_path": "/home/admin/docker_ml/workspace/model/",
        "OSS": {
            "path": "oss://my-bucket/my-model/20260316",
            "readOnly": false
        }
    }
]

The processor creates a date_hour subdirectory in the path specified by request_log_path and stores the request data. Background threads write this data to disk asynchronously. The number of background threads is set by the model_config.background_feature_thread_num parameter, which defaults to 4. Increasing this value can improve write throughput. The Protobuf files written to disk are named <request_id>_<random_str>.pb. Because OSS has limited write bandwidth, avoid sending excessive request traffic to the PAI-REC engine when debug mode is enabled. If writes fall behind, the model service's internal queue drops new requests.

To parse the generated protobuf file, use EAS-Python-SDK version 0.35 or later, or EAS-Java-SDK version 2.0.29 or later. The following is a Python example:

from eas_prediction.torchrec_predict_pb2 import PBLogData
with open('xxxx.pb', 'rb') as f:
    pb_data = f.read() 
pb_log = PBLogData()
pb_log.ParseFromString(pb_data)
print(pb_log) # Print the entire log

print(pb_log.request) # Print the request
print(pb_log.raw_features) # Print raw item-side features
print(pb_log.generate_features) # Print generated item-side features

Model service warm-up

A model service may exhibit initial response time spikes during startup or updates due to software and hardware characteristics. To prevent these spikes, configure the warm-up feature for the processor. To enable this feature in easyrec-torch-1.5 and later versions, add three parameters to model_config. For example:

"warmup_data_path": "/warmup",  # Enables warmup and sets the directory for warmup files.
"warmup_cnt_per_file": 20, # Number of warmup iterations per file. A higher value results in a more thorough warmup.
"warmup_pb_files_count": 64 # Number of online requests to save as protobuf files for warmup. A higher value covers more data patterns. Files are saved to the `warmup_data_path` directory.

To persist the protobuf file, configure an OSS mount in the storage section to point to warmup_data_path. For example:

"storage": [
    ...,
    {
        "mount_path": "/warmup",
        "oss": {
            "path": "oss://<your-warmup-pb-file-path>",
            "readOnly": false
        }
    }
]

On its first start after configuration, the processor captures and saves the number of live requests specified by warmup_pb_files_count. On subsequent restarts, it uses these saved protobuf files for warm-up.