All Products
Search
Document Center

Platform For AI:DSSM vector recall

Last Updated:Mar 31, 2026

This tutorial shows you how to build a Deep Structured Semantic Model (DSSM) two-tower vector recall pipeline using PAI's visual pipeline designer. By the end, you will have:

  • Created a DSSM recall pipeline from a preset template

  • Trained a model using negative sampling on 30-day behavioral data

  • Generated user and item vectors for recall

  • Evaluated recall quality using hit_rate@top200

Prerequisites

Before you begin, complete the feature engineering workflow to generate the following datasets:

  • rec_sln_demo_user_table_preprocess_all_feature_v2

  • rec_sln_demo_item_table_preprocess_all_feature_v2

  • rec_sln_demo_behavior_table_preprocess_v2

Open Machine Learning Designer

  1. Log on to the Machine Learning Platform for AI (PAI) console.

  2. In the left-side navigation pane, click Workspaces, then click the name of the workspace you want to use.

  3. In the left-side navigation pane, choose Model Development and Training > Visualized Modeling (Designer).

Create the pipeline

  1. On the Visualized Modeling (Designer) page, click the Preset Templates tab.

  2. In the Recommended Solution - Vector Recall section, click Create.

  3. In the Create Pipeline dialog box, configure the parameters. You can keep the default values.

    The Pipeline Data Path parameter specifies an Object Storage Service (OSS) bucket path where the pipeline stores temporary data and model artifacts during runtime.

  4. Click OK.

    Pipeline creation takes about 10 seconds. When Recommended Solution - Vector Recall appears in the pipeline list, the pipeline is ready.

  5. Double-click Recommended Solution - Vector Recall to open the pipeline on the canvas.

Understand the pipeline components

The preset template creates a complete DSSM recall pipeline. The components run in the following order:

Pipeline canvas showing components 1 through 18
ComponentWhat it does
1The sample model DSSM_Recall for vector recall
2Applies feature generation (FG) to the sample model
3Creates a positive sample table and uses it for negative sampling training
4Applies equal frequency binning to numerical features to set the model's boundaries parameter
5Counts unique values of enumeration features to set the model's embedding_dim and hash_bucket_size parameters
6Applies FG to item features
7Applies FG to user features
8Aggregates results from rec_sln_demo_dssm_recall_30d_binning_v1 and rec_sln_demo_dssm_recall_30d_count_v1 to calculate feature configuration and step configuration
9Creates an item table for negative sampling
10Discretizes 30-day sample data from the DSSM_Recall model to generate training samples
11Specifies the EasyRec configuration file based on the output of component 8
12You need to run component 11 to generate the EasyRec configuration file before model training
13Runs inference with the split item model on rec_sln_demo_dssm_recall_item_feature_fg_encoded_v1 to produce item vectors
14Runs inference with the split user model on rec_sln_demo_dssm_recall_user_feature_fg_encoded_v1 to produce user vectors
15Creates a sequence table and evaluates recall quality using hit_rate
18Calculates the final evaluation metric: hit_rate@top200
Note

Component 12 requires component 11 to complete first, because the EasyRec configuration file must exist before model training starts.

Note

The hit_rate evaluation in components 15 and 18 excludes new users and new items that appear on the evaluation day, because there is no historical data to recall for them.

Run the pipeline and view results

  1. In the top toolbar of the canvas, click Run.

    The pipeline runs all components in dependency order. Monitor progress by watching each component's status on the canvas. When all components show a completed status, the pipeline has finished.

  2. Right-click component 18 (18_rec_sln_demo_recall_total_hit_rate_v1_2) and choose View Data > hit_rate_detail to see recall hit rate by segment.

    hit_rate_detail view

  3. Right-click component 18 and choose View Data > total_hit_rate to see the overall recall hit rate.

    total_hit_rate view