All Products
Search
Document Center

Platform For AI:TorchEasyRec Processor

Last Updated:Mar 18, 2026

The built-in TorchEasyRec processor in Elastic Algorithm Service (EAS) deploys recommendation models trained with TorchEasyRec or PyTorch as scoring services. It integrates feature engineering capabilities to deliver high-performance scoring by jointly optimizing feature engineering and the PyTorch model.

Architecture

The following figure shows the recommendation engine architecture based on the TorchEasyRec processor.

image

The TorchEasyRec processor contains the following modules:

  • Item Feature Cache: Caches item-side features from FeatureStore in memory to reduce network overhead and request pressure on FeatureStore. This improves inference service performance. When item-side features include real-time features, FeatureStore handles synchronization.

  • Feature Generator (FG): Generates features based on a configuration file. A single set of C++ code ensures consistent logic for offline and online feature processing.

  • TorchModel: A PyTorch model exported as a ScriptedModel after training with TorchEasyRec or PyTorch.

Limitations

Only general-purpose instance families g6, g7, or g8 are supported. GPU models such as T4 and A10 are also supported. See general-purpose instance families (g series). For GPU services, ensure CUDA Driver version is 535 or later.

Version history

The TorchEasyRec processor is under active development. Use the latest version to deploy inference services. New versions provide additional features and improved performance. Released versions:

Processor name

Release date

Torch version

FG version

New features

easyrec-torch-0.1

20240910

2.4

0.2.9

  • Supports Feature Generator (FG) and FeatureStore item feature cache.

  • Supports CPU and GPU inference for Torch models.

  • Supports automatic expansion of Input_Tile user class features.

  • Supports Faiss vector recall.

  • Supports model warm-up in normal mode.

easyrec-torch-0.2

20240930

2.4

0.2.9

  • FeatureDB supports complex types.

  • Accelerates the data loading time for FeatureStore initialization.

  • Optimizes the debug_level in bypass mode.

  • Optimizes H2D.

easyrec-torch-0.3

20241014

2.4

0.2.9

  • FeatureStore supports JSON initialization.

  • Redefines the proto.

easyrec-torch-0.4

20241028

2.4

0.3.1

  • Fixes an issue with complex types in Feature Generator (FG).

easyrec-torch-0.5

20241114

2.4

0.3.1

  • Optimizes online and offline consistency logic. When Debug is set, generates feature information after FG regardless of whether the item exists.

easyrec-torch-0.6

20241118

2.4

0.3.6

  • Optimizes packaging process to remove redundant header files.

easyrec-torch-0.7

20241206

2.5

0.3.9

  • Sequence primary key supports arrays.

  • Upgrades Torch version to 2.5.

  • Upgrades FG version to 0.3.9.

easyrec-torch-0.8

20241225

2.5

0.3.9

  • Upgrades TensorRT SDK version to 2.5.

  • Model input of TorchEasyRec supports int64 type.

  • Upgrades FeatureStore version to resolve Hologres feature query issues.

  • Optimizes running efficiency and logic during debugging.

  • Adds item_features to proto to support passing item features from the request side.

easyrec-torch-0.9

20250115

2.5

0.4.1

  • Upgrades Feature Generator (FG) version to 0.4.1 to optimize initialization time for multi-threaded FG.

easyrec-torch-1.0

20250206

2.5

0.4.2

  • Supports Weighted Feature.

  • Upgrades Feature Generator (FG) version to 0.4.2.

  • Supports AMD CPUs.

easyrec-torch-1.1

20250423

2.5

0.5.9

  • Upgrades FeatureStore SDK to add support for high-speed connectivity to FeatureDB over VPC networks. Also supports filtering expired real-time feature data in memory based on event_time and ttl.

  • Upgrades Feature Generator (FG) version to add support for custom sequence features and fixes issues related to combo features.

easyrec-torch-1.2

20250512

2.5

0.6.0

  • Upgrades FG to 0.6.0.

  • Supports reading features from multiple FeatureStore entities, for example, config["fs_entity"] = "item,raw".

  • Outputs item IDs in the request that are not in FeatureStore during debugging.

easyrec-torch-1.3

20250529

2.5

0.6.5

  • Upgrades FG to 0.6.5.

  • Weighted ID feature supports FSMAP.

  • Supports WordPiece tokenization.

  • Supports boolean_mask filter operator.

  • Optimizes and upgrades expression feature operator.

easyrec-torch-1.4

20250715

2.5

0.6.9

  • Upgrades FG to 0.6.9.

  • Adds several functions to expression feature operator.

  • Moves Debug String generation logic from processor to FG library.

easyrec-torch-1.5

20250918

2.5

0.7.3

  • Upgrades FG to 0.7.3.

  • Supports capturing online requests for model warm-up.

  • Upgrades FeatureStore SDK to 20250826 to support three-level table schemas in MaxCompute, zero-trust calls without AccessKey, and compatibility with adding features to feature views.

easyrec-torch-1.6

20251021

2.5

0.7.4

  • Optimizes log control to prevent excessive output from affecting performance when there are many callback requests.

  • Optimizes context feature processing.

  • Shares thread pool between feature pre-processing and FG to save thread resources.

  • Upgrades FG to 0.7.4.

easyrec-torch-1.7

20251104

2.5

0.7.4

  • Optimizes logic for saving debug tensors to avoid excessive file saving triggered by callbacks.

easyrec-torch-1.8

20251201

2.5

0.7.4

  • Optimizes FeatureStore SDK thread pool to prevent thread creation failures when resources are tight.

  • Upgrades FS SDK to 20251117.

easyrec-torch-1.9

20260109

2.5

1.0.0

  • GPU inference supports CUDA multi-stream to improve throughput and performance.

  • Upgrades FG to 1.0.0.

easyrec-torch-1.10

20260123

2.5

1.0.1

  • Automatically records running time of slow requests in logs.

  • Adds configuration parameter to save request data when a slow request occurs.

easyrec-torch-1.11

20260210

2.5

1.0.1

  • Fixes memory continuity issue of output tensors in some scenarios.

  • Upgrades FS SDK to 20260202.

easyrec-torch-1.12

20260313

2.5

1.0.1

  • When debug mode is enabled for the request body of the PAI-Rec engine, the model service asynchronously saves the original request and item-side features (both before and after FG) to disk in protobuf format. Use the request_log_path parameter to specify the save path and mount an OSS path at startup.

  • Upgrades FS SDK to 20260305.

Version 2.0 and later requirements:

The GLIBC version of the EAS backend base image was upgraded in easyrec-torch-2.0. When deploying version 2.0 or later:

  1. For new EAS services, follow the standard deployment process. The process is identical to deploying versions 0.x/1.x.

  2. For existing EAS services created before March 15, 2026, contact your Alibaba Cloud technical expert to upgrade the EAS backend base image before updating and deploying the processor. Otherwise, deployment may fail due to environment incompatibility.

easyrec-torch-2.0

20260317

2.8

1.0.1

  • Upgrades PyTorch runtime to 2.8.

  • Upgrades CUDA runtime to 12.6.

  • Upgrades fbgemm_gpu runtime to 1.3.

  • Upgrades base image GLIBC to 2.38.

Deploy a service

  1. Prepare the service configuration file torcheasyrec.json.

    Set the processor type to easyrec-torch-{version}. For {version}, see Version history. The following examples show JSON configurations:

    Example: Using FG (fg_mode='normal')

    {
      "metadata": {
        "instance": 1,
        "name": "alirec_rank_with_fg",
        "rpc": {
          "enable_jemalloc": 1,
          "max_queue_size": 256,
          "worker_threads": 16
        }
      },
      "cloud": {
            "computing": {
                "instance_type": "ecs.gn6i-c16g1.4xlarge"
            }
      },
      "model_config": {
        "fg_mode": "normal",
        "fg_threads": 8,
        "region": "YOUR_REGION",
        "fs_project": "YOUR_FS_PROJECT",
        "fs_model": "YOUR_FS_MODEL",
        "fs_entity": "item",
        "load_feature_from_offlinestore": true,
        "access_key_id":"YOUR_ACCESS_KEY_ID",
        "access_key_secret":"YOUR_ACCESS_KEY_SECRET"
      },
      "storage": [
        {
          "mount_path": "/home/admin/docker_ml/workspace/model/",
          "oss": {
            "path": "oss://xxx/xxx/export",
            "readOnly": false
          },
          "properties": {
            "resource_type": "code"
          }
        }
      ],
      "processor":"easyrec-torch-1.12"
    }

    Example: Not using FG (fg_mode='bypass')

    {
      "metadata": {
        "instance": 1,
        "name": "alirec_rank_no_fg",
        "rpc": {
          "enable_jemalloc": 1,
          "max_queue_size": 256,
          "worker_threads": 16
        }
      },
      "cloud": {
            "computing": {
                "instance_type": "ecs.gn6i-c16g1.4xlarge"
            }
      },
      "model_config": {
        "fg_mode": "bypass"
      },
      "storage": [
        {
          "mount_path": "/home/admin/docker_ml/workspace/model/",
          "oss": {
            "path": "oss://xxx/xxx/export",
            "readOnly": false
          },
          "properties": {
            "resource_type": "code"
          }
        }
      ],
      "processor":"easyrec-torch-1.12"
    }

    The following table describes key parameters. For other parameters, see JSON deployment.

    Parameter

    Required

    Description

    Example

    processor

    Yes

    The TorchEasyRec processor.

    "processor":"easyrec-torch-1.12"

    path

    Yes

    The OSS path mounted to service storage for storing model files.

    "path": "oss://examplebucket/xxx/export"

    fg_mode

    No

    Feature engineering mode. Valid values:

    • bypass (default): Does not use FG. Only the Torch model is deployed.

      • Suitable for scenarios using custom feature processing.

      • In this mode, no need to configure parameters for the processor to access FeatureStore.

    • normal: Uses FG. Typically used with TorchEasyRec for model training.

    "fg_mode": "normal"

    fg_threads

    No

    Number of concurrent threads for executing FG for a single request.

    "fg_threads": 15

    outputs

    No

    Names of output variables predicted by the Torch model, such as probs_ctr. For multiple names, separate with commas (,). By default, all variables are output.

    "outputs":"probs_ctr,probs_cvr"

    item_empty_score

    No

    Default score when an item ID does not exist. Default value: 0.

    "item_empty_score": -1

    Processor recall parameters

    faiss_neigh_num

    No

    Number of vectors to recall using FAISS. By default, this value comes from the faiss_neigh_num field in the request body. If not provided, uses the faiss_neigh_num value in model_config. Default: 1.

    "faiss_neigh_num": 200

    faiss_nprobe

    No

    The nprobe parameter specifies the number of clusters to retrieve during retrieval. Default: 800. The posting list index in FAISS divides data into multiple small clusters and maintains a posting list for each. A larger nprobe value usually increases search precision but also computing cost and search time. A smaller value reduces precision but speeds up search.

    "faiss_nprobe" : 700

    Parameters for the processor to access FeatureStore

    fs_project

    No

    FeatureStore project name. Required when using FeatureStore. See Configure a FeatureStore project.

    "fs_project": "fs_demo"

    fs_model

    No

    Model feature name in FeatureStore.

    "fs_model": "fs_rank_v1"

    fs_entity

    No

    Entity name in FeatureStore.

    "fs_entity": "item"

    region

    No

    Region where FeatureStore resides. For example, cn-beijing for China (Beijing). See Endpoints.

    "region": "cn-beijing"

    access_key_id

    No

    AccessKey ID for FeatureStore.

    "access_key_id": "xxxxx"

    access_key_secret

    No

    AccessKey secret for FeatureStore.

    "access_key_secret": "xxxxx"

    load_feature_from_offlinestore

    No

    Whether to obtain offline feature data directly from FeatureStore OfflineStore. Valid values:

    • True: Obtain data from FeatureStore OfflineStore.

    • False (default): Obtain data from FeatureStore OnlineStore.

    "load_feature_from_offlinestore": True

    featuredb_username

    No

    Username for FeatureDB.

    "featuredb_username":"xxx"

    featuredb_password

    No

    Password for FeatureDB.

    "featuredb_passwd":"xxx"

    input_tile: Parameters for automatic feature expansion

    INPUT_TILE

    No

    Supports automatic feature expansion. For features with the same value in a single request, such as user_id, only need to pass one value. This reduces request size, network transmission time, and computation time.

    This feature must be used in normal mode with TorchEasyRec. Set the corresponding environment variable when exporting the model. The system reads the INPUT_TILE value from model_acc.json in the model directory exported from TorchEasyRec. If the file does not exist, the system reads from the environment variable.

    After enabling this feature:

    • If the environment variable is set to 2, FG for user-side features is calculated only once.

    • If the environment variable is set to 3, FG for user-side features is calculated only once. The system calculates embeddings for user and item separately, and user-side embedding is calculated only once. Suitable for scenarios with many user-side features.

    "processor_envs":

    [

    {

    "name": "INPUT_TILE",

    "value": "2"

    }

    ]

    NO_GRAD_GUARD

    No

    Disables gradient calculation during inference. This stops operation tracking and prevents computation graph construction.

    Note

    When set to 1, some models may be incompatible. If you encounter stuttering during the second inference run, add environment variable PYTORCH_TENSOREXPR_FALLBACK=2 to resolve this. This skips the compilation step while retaining some graph optimization features.

    "processor_envs":

    [

    {

    "name": "NO_GRAD_GUARD",

    "value": "1"

    }

    ]

    Model warm-up parameters

    warmup_data_path

    No

    Enables model warm-up feature and specifies the path to save warm-up files. To ensure warm-up files are not lost, add an OSS mount to this path in storage configuration.

    "warmup_data_path": "/warmup"

    warmup_cnt_per_file

    No

    Number of warm-up times for each warm-up protobuf file. Increasing this parameter ensures sufficient warm-up but extends the ramp-up period. Default: 20.

    "warmup_cnt_per_file": 20,

    warmup_pb_files_count

    No

    Number of online requests to save. Requests are saved as protobuf files for next startup warm-up. The save path is specified by warmup_data_path parameter. Default: 64.

    "warmup_pb_files_count": 64

    Slow request logging and saving

    long_request_threshold

    No

    Time threshold for slow requests in ms. For requests exceeding this threshold, running time of each stage is automatically recorded in the log. Default: 200 ms.

    "long_request_threshold": 200

    save_long_request

    No

    Boolean parameter. Whether to save the request as protobuf file when a slow request occurs (exceeds long_request_threshold). Default: false.

    "save_long_request": true

    Writing original requests and item features to OSS storage

    request_log_path

    No

    Path where protobuf files are saved to disk. In model service configuration, use an OSS mount for this path.

    "request_log_path": "/online_log_pb"

    background_feature_thread_num

    No

    Number of threads responsible for the backend task of saving to disk. Default: 4. If the disk-saving task is heavy, increase this value to improve protobuf file save speed.

    "background_feature_thread_num": 8

  2. Deploy the TorchEasyRec model service using one of the following methods:

    On-premises deployment using JSON (recommended)

    Procedure:

    1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

    2. On the Elastic Algorithm Service (EAS) page, click Deploy Service, and then in the Custom Model Deployment section, click JSON On-Premises Deployment.

    3. In the JSON text box, enter the prepared JSON configuration, and click Deploy.

    Deployment using eascmd

    1. Download and authenticate the client. This topic uses the Windows 64-bit version as an example.

    2. Open a terminal tool. In the directory where the JSON file is located, run the following command to create a service. See Command reference.

      eascmdwin64.exe create <service.json>

      Replace <service.json> with the name of the JSON file you created, such as torcheasyrec.json.

Invoke a service

After deploying the TorchEasyRec model service, view service invocation information:

  1. Log on to the PAI console, select the region at the top of the page and the workspace on the right, and then click Enter EAS.

  2. Click Invocation Information in the Service Method column of the target service to view the service endpoint and token.image

The input and output of the TorchEasyRec model service are in protobuf format. The invocation method depends on whether FG is used:

Using FG (fg_mode='normal')

Two invocation methods are supported:

EAS Java SDK

Before running the code, configure the Maven environment. See Use the Java SDK. For the latest Java SDK version, see https://github.com/pai-eas/eas-java-sdk. The following example shows how to request the alirec_rank_with_fg service:

package com.aliyun.openservices.eas.predict;

import com.aliyun.openservices.eas.predict.http.Compressor;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.proto.TorchRecPredictProtos;
import com.aliyun.openservices.eas.predict.request.TorchRecRequest;
import com.aliyun.openservices.eas.predict.proto.TorchPredictProtos.ArrayProto;

import java.util.*;


public class TorchRecPredictTest {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRecRequest buildPredictRequest() {
        TorchRecRequest TorchRecRequest = new TorchRecRequest();
        TorchRecRequest.appendItemId("7033");

        TorchRecRequest.addUserFeature("user_id", 33981,"int");

        ArrayList<Double> list = new ArrayList<>();
        list.add(0.24689289764507472);
        list.add(0.005758482924454689);
        list.add(0.6765301324940026);
        list.add(0.18137273055602343);
        TorchRecRequest.addUserFeature("raw_3", list,"List<double>");

        Map<String,Integer> myMap =new LinkedHashMap<>();
        myMap.put("866", 4143);
        myMap.put("1627", 2451);
        TorchRecRequest.addUserFeature("map_1", myMap,"map<string,int>");

        ArrayList<ArrayList<Float>> list2 = new ArrayList<>();
        ArrayList<Float> innerList1 = new ArrayList<>();
        innerList1.add(1.1f);
        innerList1.add(2.2f);
        innerList1.add(3.3f);
        list2.add(innerList1);
        ArrayList<Float> innerList2 = new ArrayList<>();
        innerList2.add(4.4f);
        innerList2.add(5.5f);
        list2.add(innerList2);
        TorchRecRequest.addUserFeature("click", list2,"list<list<float>>");

        TorchRecRequest.addContextFeature("id_2", list,"List<double>");
        TorchRecRequest.addContextFeature("id_2", list,"List<double>");

        System.out.println(TorchRecRequest.request);
        return TorchRecRequest;
    }

    public static void main(String[] args) throws Exception{
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_with_fg");
        client.setRequestTimeout(100000);


        testInvoke(client);
        testDebugLevel(client);
        client.shutdown();
    }

    public static void testInvoke(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecPredictProtos.PBResponse response = client.predict(buildPredictRequest());
        for (Map.Entry<String, ArrayProto> entry : response.getMapOutputsMap().entrySet()) {

            System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }

    public static void testDebugLevel(PredictClient client) throws Exception {
        long startTime = System.currentTimeMillis();
        TorchRecRequest request = buildPredictRequest();
        request.setDebugLevel(1);
        TorchRecPredictProtos.PBResponse response = client.predict(request);
        Map<String, String> genFeas = response.getGenerateFeaturesMap();
        for(String itemId: genFeas.keySet()) {
            System.out.println(itemId);
            System.out.println(genFeas.get(itemId));
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");

    }
}

Where:

  • client.setToken("tokenGeneratedFromService"): Replace the parameter in parentheses with your service token. For example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

  • client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the parameter in parentheses with your service endpoint. For example, 175805416243****.cn-beijing.pai-eas.aliyuncs.com.

  • client.setModelName("alirec_rank_with_fg"): Replace the parameter in parentheses with your service name.

Using the EAS Python SDK

Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see Use the Python SDK. The following code provides an example:

from eas_prediction import PredictClient
from eas_prediction.torchrec_request import TorchRecRequest


if __name__ == '__main__':
    endpoint = 'http://localhost:6016'

    client = PredictClient(endpoint, '<YOUR_SERVICE_NAME>')
    client.set_token('<your_service_token>')
    client.init()
    torchrec_req = TorchRecRequest()

    torchrec_req.add_user_fea('user_id', 'u001d', "STRING")
    torchrec_req.add_user_fea('age', 12, "INT")
    torchrec_req.add_user_fea('weight', 129.8, "FLOAT")
    torchrec_req.add_item_id('item_0001')
    torchrec_req.add_item_id('item_0002')
    torchrec_req.add_item_id('item_0003')
    torchrec_req.add_user_fea("raw_3", [0.24689289764507472, 0.005758482924454689, 0.6765301324940026, 0.18137273055602343], "list<double>")
    torchrec_req.add_user_fea("raw_4", [0.9965264740966043, 0.659596586238391, 0.16396649403055896, 0.08364986620265635], "list<double>")
    torchrec_req.add_user_fea("map_1", {"0":0.37845234405201145}, "map<int,float>")
    torchrec_req.add_user_fea("map_2", {"866":4143,"1627":2451}, "map<int,int>")
    torchrec_req.add_context_fea("id_2", [866], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_context_fea("id_2", [7022,1], "list<int>" )
    torchrec_req.add_user_fea("click", [[0.94433516,0.49145547], [0.94433516, 0.49145597]], "list<list<float>>")

    res = client.predict(torchrec_req)
    print(res)

The following table describes the key parameters.

  • endpoint: Set this to your service endpoint, for example, http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.

  • <your_service_name>: Replace this with your service name.

  • <your_service_token>: Replace this with your service token, for example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

Not using FG (fg_mode='bypass')

Using the EAS Java SDK

Before you run the code, configure the Maven environment. For more information, see Use the Java SDK. Check the GitHub page for the latest version number of the SDK. The following code provides an example of how to request the alirec_rank_no_fg service:

package com.aliyun.openservices.eas.predict;

import java.util.List;
import java.util.Arrays;


import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;
import com.aliyun.openservices.eas.predict.request.TorchDataType;
import com.aliyun.openservices.eas.predict.request.TorchRequest;
import com.aliyun.openservices.eas.predict.response.TorchResponse;

public class Test_Torch {
    public static PredictClient InitClient() {
        return new PredictClient(new HttpConfig());
    }

    public static TorchRequest buildPredictRequest() {
        TorchRequest request = new TorchRequest();
        float[] content = new float[2304000];
        for (int i = 0; i < content.length; i++) {
            content[i] = (float) 0.0;
        }
        long[] content_i = new long[900];
        for (int i = 0; i < content_i.length; i++) {
            content_i[i] = 0;
        }

        long[] a = Arrays.copyOfRange(content_i, 0, 300);
        float[] b = Arrays.copyOfRange(content, 0, 230400);
        request.addFeed(0, TorchDataType.DT_INT64, new long[]{300,3}, content_i);
        request.addFeed(1, TorchDataType.DT_FLOAT, new long[]{300,10,768}, content);
        request.addFeed(2, TorchDataType.DT_FLOAT, new long[]{300,768}, b);
        request.addFeed(3, TorchDataType.DT_INT64, new long[]{300}, a);
        request.addFetch(0);
        request.setDebugLevel(903);
        return request;
    }

    public static void main(String[] args) throws Exception {
        PredictClient client = InitClient();
        client.setToken("tokenGeneratedFromService");
        client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com");
        client.setModelName("alirec_rank_no_fg");
        client.setIsCompressed(false);
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < 10; i++) {
            TorchResponse response = null;
            try {
                response = client.predict(buildPredictRequest());
                List<Float> result = response.getFloatVals(0);
                System.out.print("Predict Result: [");
                for (int j = 0; j < result.size(); j++) {
                    System.out.print(result.get(j).floatValue());
                    if (j != result.size() - 1) {
                        System.out.print(", ");
                    }
                }
                System.out.print("]\n");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Spend Time: " + (endTime - startTime) + "ms");
        client.shutdown();
    }
}

Where:

  • client.setToken("tokenGeneratedFromService"): Replace the parameter in parentheses with your service token. For example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

  • client.setEndpoint("175805416243****.cn-beijing.pai-eas.aliyuncs.com"): Replace the parameter in parentheses with your service endpoint. For example, 175805416243****.cn-beijing.pai-eas.aliyuncs.com.

  • client.setModelName("alirec_rank_no_fg"): Replace the parameter in parentheses with your service name.

Using the EAS Python SDK

Before you run the code, run the pip install -U eas-prediction --user command to install or update the eas-prediction library. For more information, see Use the Python SDK. The following code provides an example of how to request the alirec_rank_no_fg service:

from eas_prediction import PredictClient
from eas_prediction import TorchRequest

# snappy data
req = TorchRequest(False)

req.add_feed(0, [300, 3], TorchRequest.DT_INT64, [1] * 900)
req.add_feed(1, [300, 10, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 768000)
req.add_feed(2, [300, 768], TorchRequest.DT_FLOAT, [1.0] * 3 * 76800)
req.add_feed(3, [300], TorchRequest.DT_INT64, [1] * 300)


client = PredictClient('<your_endpoint>', '<your_service_name>')
client.set_token('<your_service_token>')

client.init()

resp = client.predict(req)
print(resp)

The following table describes the key parameters.

  • <your_endpoint>: Replace this with your service endpoint, for example, http://175805416243****.cn-beijing.pai-eas.aliyuncs.com/.

  • <your_service_name>: Replace this with your service name.

  • <your_service_token>: Replace this with your service token, for example, MmFiMDdlO****wYjhhNjgwZmZjYjBjMTM1YjliZmNkODhjOGVi****.

For more information about the status codes returned when you access the service, see Service status code description. You can also build a service request manually. For more information, see Request format.

Request format

When a client calls the service, you can manually generate the prediction request code file based on the .proto file. To build the service request manually, refer to the following protobuf definitions to generate the corresponding code:

pytorch_predict.proto: Request definition for a Torch model

syntax = "proto3";

package pytorch.eas;
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchPredictProtos";

enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;
  
  // Data types that all computation devices are expected to be
  // capable to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
}

// Dimensions of an array
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}

// Protocol buffer representing an array
message ArrayProto {
  // Data Type.
  ArrayDataType dtype = 1;

  // Shape of the array.
  ArrayShape array_shape = 2;

  // DT_FLOAT.
  repeated float float_val = 3 [packed = true];

  // DT_DOUBLE.
  repeated double double_val = 4 [packed = true];

  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];

  // DT_STRING.
  repeated bytes string_val = 6;

  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];

}


message PredictRequest {

  // Input tensors.
  repeated ArrayProto inputs = 1;

  // Output filter.
  repeated int32 output_filter = 2;

  // Input tensors for rec
  map<string, ArrayProto> map_inputs = 3;

  // debug_level for rec
  int32 debug_level = 100;
}

// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  repeated ArrayProto outputs = 1;
  // Output tensors for rec.
  map<string, ArrayProto> map_outputs = 2;
}

torchrec_predict.proto: Request definition for a Torch model with FG

syntax = "proto3";

option go_package = ".;torch_predict_protos";
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "TorchRecPredictProtos";
package com.alibaba.pairec.processor;
import "pytorch_predict.proto";

//long->others
message LongStringMap {
  map<int64, string> map_field = 1;
}
message LongIntMap {
  map<int64, int32> map_field = 1;
}
message LongLongMap {
  map<int64, int64> map_field = 1;
}
message LongFloatMap {
  map<int64, float> map_field = 1;
}
message LongDoubleMap {
  map<int64, double> map_field = 1;
}

//string->others
message StringStringMap {
  map<string, string> map_field = 1;
}
message StringIntMap {
  map<string, int32> map_field = 1;
}
message StringLongMap {
  map<string, int64> map_field = 1;
}
message StringFloatMap {
  map<string, float> map_field = 1;
}
message StringDoubleMap {
  map<string, double> map_field = 1;
}

//int32->others
message IntStringMap {
  map<int32, string> map_field = 1;
}
message IntIntMap {
  map<int32, int32> map_field = 1;
}
message IntLongMap {
  map<int32, int64> map_field = 1;
}
message IntFloatMap {
  map<int32, float> map_field = 1;
}
message IntDoubleMap {
  map<int32, double> map_field = 1;
}

// list
message IntList {
  repeated int32 features = 1;
}
message LongList {
  repeated int64 features  = 1;
}

message FloatList {
  repeated float features = 1;
}
message DoubleList {
  repeated double features = 1;
}
message StringList {
  repeated string features = 1;
}

// lists
message IntLists {
  repeated IntList lists = 1;
}
message LongLists {
  repeated LongList lists = 1;
}

message FloatLists {
  repeated FloatList lists = 1;
}
message DoubleLists {
  repeated DoubleList lists = 1;
}
message StringLists {
  repeated StringList lists = 1;
}

message PBFeature {
  oneof value {
    int32 int_feature = 1;
    int64 long_feature = 2;
    string string_feature = 3;
    float float_feature = 4;
    double double_feature=5;

    LongStringMap long_string_map = 6; 
    LongIntMap long_int_map = 7; 
    LongLongMap long_long_map = 8; 
    LongFloatMap long_float_map = 9; 
    LongDoubleMap long_double_map = 10; 
    
    StringStringMap string_string_map = 11; 
    StringIntMap string_int_map = 12; 
    StringLongMap string_long_map = 13; 
    StringFloatMap string_float_map = 14; 
    StringDoubleMap string_double_map = 15; 

    IntStringMap int_string_map = 16; 
    IntIntMap int_int_map = 17; 
    IntLongMap int_long_map = 18; 
    IntFloatMap int_float_map = 19; 
    IntDoubleMap int_double_map = 20; 

    IntList int_list = 21; 
    LongList long_list =22;
    StringList string_list = 23;
    FloatList float_list = 24;
    DoubleList double_list = 25;

    IntLists int_lists = 26;
    LongLists long_lists =27;
    StringLists string_lists = 28;
    FloatLists float_lists = 29;
    DoubleLists double_lists = 30;
    
  }
}

// context features
message ContextFeatures {
  repeated PBFeature features = 1;
}

// PBRequest specifies the request for aggregator
message PBRequest {
  // debug mode
  int32 debug_level = 1;

  // user features, key is user input name
  map<string, PBFeature> user_features = 2;

  // item ids
  repeated string item_ids = 3;

  // context features for each item, key is context input name 
  map<string, ContextFeatures> context_features = 4;

  // number of nearest neighbors(items) to retrieve
  // from faiss
  int32 faiss_neigh_num = 5;

  // item features for each item, key is item input name 
  map<string, ContextFeatures> item_features = 6;
  
  // optional meta data
  map<string, string> meta_data = 7;
}

// PBResponse specifies the response for aggregator
message PBResponse {
  // torch output tensors
  map<string, pytorch.eas.ArrayProto> map_outputs = 1;

  // fg ouput features
  map<string, string> generate_features = 2;

  // all fg input features
  map<string, string> raw_features = 3;

  // item ids
  repeated string item_ids = 4;

}

The following table describes the debug_level.

Note

By default, you do not need to configure this parameter. Pass it only when debugging.

debug_level

Description

0

The service performs prediction normally.

1

In normal mode, this validates the keys of the request and the shapes of the FG input and output. It also returns the input and output features but does not perform prediction.

2

In normal mode, this validates the keys of the request and the shapes of the FG input and output. It returns the input and output features, along with the model input tensor, and performs prediction.

3

In normal mode, this validates the keys of the request and the shapes of the FG input and output. It returns the output features but does not perform prediction.

100

In normal mode, this saves the prediction request. The saved protobuf file contains the original request and the input and output features on the item side. The save path is specified by the request_log_path parameter.

102

In normal mode, this performs vector recall, validates the keys of the request, and validates the shapes of the FG input and output. It saves the input and output features, the model input tensor, and the User Embedding result.

903

Prints the prediction time for each stage.

904

Validates the missing feature fields in the request and records them in the log.

Service status code description

The following table describes the main status codes that may be returned when you access a TorchEasyRec service. For more information about the status codes returned when you access an EAS service, see Appendix: Service status codes and common errors.

Status code

Description

200

The service returns a normal response.

400

The request input is invalid.

500

The prediction failed. Check the service log for details.

Save and parse Request pb files

For processor version 1.12 and later, when debug=True is enabled for the request body of the PAI-Rec engine, the processor saves the original request and the input and output features of the item side to a protobuf file on disk. This supports subsequent feature analysis and verification. The protobuf file contains the original request data, the input features on the item side, and the transformed features on the item side. To use this feature, configure the request_log_path parameter to specify the save path and mount an OSS path to it. For example:

"model_config": {
        "fg_mode": "normal",
        "fg_threads": 8,
        "request_log_path": "/request_log",
        "background_feature_thread_num": 8
},
 "storage": [
    {
        "mount_path": "/request_log",
        "oss": {
            "path": "oss://my-bucket/my-model/myrequests/",
            "readOnly": false
        }
    },
    {
        "mount_path": "/home/admin/docker_ml/workspace/model/",
        "oss": {
            "path": "oss://my-bucket/my-model/20260316",
            "readOnly": false
        }
    }
]

The processor creates a date_hour subdirectory in the path specified by request_log_path and saves the request data. Disk writes are performed asynchronously by a background thread. The number of background threads is configurable via the model_config.background_feature_thread_num parameter. The default value is 4. You can increase this value to improve write speed. The saved protobuf file name follows the format <request_id>_<random_str>.pb. Because OSS write bandwidth is limited, the traffic of request bodies with debug mode enabled in the PAI-Rec engine should remain moderate. If traffic is too high and writes fall behind, the internal queue of the model service discards new requests without saving them.

To parse the obtained protobuf files, use EAS-Python-SDK version 0.35 or later, or EAS-Java-SDK version 2.0.29 or later. The following code provides a Python example:

from eas_prediction.torchrec_predict_pb2 import PBLogData
with open('xxxx.pb', 'rb') as f:
    pb_data = f.read() 
pb_log = PBLogData()
pb_log.ParseFromString(pb_data)
print(pb_log) # Print all logs

print(pb_log.request) # Print the request
print(pb_log.raw_features) # Print the raw item-side features
print(pb_log.generate_features) # Print the item-side features after feature generation (fg)