All Products
Search
Document Center

Use FM-Embedding for recommendation - vector-based recall.md

Last Updated: May 14, 2020

Background

The data and procedure of the experiment are built in the corresponding template on th home page of Machine Learning Platform for AI (PAI) Studio at https://data.aliyun.com/product/learn

Log on to PAI Studio. In the lower part of the FM-Embedding for Rec-System template, click Create. The template is ready for use.

AI-based recommendation is divided into two modules: sorting and recall. The recall module uses vectors to represent users and to-be-recommended items. The product of the vectorized user and item indicates the user’s interest in the item. The following experiment shows how to create descriptive vectors for users and items based on real-life recommendation data by using the Factorization Machine (FM) algorithm and the Embedding algorithm that are provided by PAI.

Procedure

Flowchart:

1. Data

Raw data:

Data fields:

  • userid: the ID of a user
  • age: the age of the user
  • gender: the gender of the user
  • itemid: the ID of an item
  • price: the price of the item
  • size: the size of the item
  • label: the target column, indicating whether the item is purchased. 1 indicates that the item is purchased. 0 indicates that the item is not purchased.

2. One-hot encoding

One-hot encoding converts character-type data to numeric data. In the FM-Embedding solution, one-hot encoding-1 is used to encode full data. An encoding model is created and imported to one-hot encoding-2 and one-hot encoding-3. In one-hot encoding-2, select features of the user for encoding. In one-hot encoding-3, select features of the item for encoding.

Enter userid, gender, and age in one-hot encoding-2, and select userid as the additional column.

Enter itemid, price, and size in one-hot encoding-3, and select itemid as the additional column.

3. FM training

Regularization coefficient and Dimension each have three parameters: constant term, monomial term, and quadratic term. The third parameter “10” of Dimension indicates the dimensions of the created Embedding node.

4. Embedding extraction

  • Name of the Embedding Vector ID Column: Enter “feature_id” of the model in FM training in the left pile.
  • Embedding vector column name: Enter “feature_weights” of the model in FM training in the left pile.
  • Weight vector column name: Enter the sparse data column that corresponds to the right pile.
  • Output result column name: Enter the name of the output Embedding field.

Final output:

Summary

PAI provides the FM-Embedding solution, allowing you to quickly mine the feature vectors of a user and an item. The recall module gives a score based on the product of feature vectors of the user and item.