This topic describes how to use Bipartite Graph SAmple and aggreGatE (GraphSAGE) to obtain feature vectors of users and items for matching recall.

Background information

Graph neural network is a widely discussed concept in deep learning. The open source Graph-Learn framework of Machine Learning Platform for AI (PAI) provides a large number of graph learning algorithms. GraphSAGE is a matching algorithm for graph neural networks. Bipartite GraphSAGE is an extension of GraphSAGE and is used to process bipartite graphs. Bipartite GraphSAGE is used by Taobao for matching recall.

In a bipartite graph, each user or item is represented by a vertex. The correlation, such as clicking or purchasing, between a user and an item is represented by an edge. The system samples adjacent vertices of each vertex that represents a user and each vertex that represents an item based on the meta paths User-Item-User-Item... and Item-User-Item-User....

Procedure

  1. Go to the Machine Learning Studio console.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.Machine Learning
  2. Create an experiment.
    1. In the left-side navigation pane, click Home.
    2. In the Templates section, click Create below RecSys-GraphEmbedding.
    3. In the New Experiment dialog box, set the experiment parameters. You can use the default values of the parameters.
      Parameter Description
      Name The name of the experiment. Default value: RecSys-GraphEmbedding.
      Project The project in which you want to create the experiment. You cannot change the value of this parameter.
      Description The description of the experiment. Default value: Rec system GraphEmbedding matching.
      Save To The directory for storing the experiment. Default value: My Experiments.
    4. Click OK.
    5. Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
    6. Optional:Click RecSys-GraphEmbedding_XX under My Experiments. The canvas of the experiment appears.
      My Experiments is the directory for storing the experiment that you created and RecSys-GraphEmbedding_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
    7. View the components of the experiment on the canvas, as shown in the following figure. The system automatically creates the experiment based on the preset template.
      Experiment on matching recall by using Bipartite GraphSAGE
      Component No. Description
      1 This component imports data from the table that records user behavior on items, including the following fields:
      • user: the ID of the user. The value must be of the BIGINT type.
      • item: the ID of the item. The value must be of the BIGINT type.
      • weight: the behavior that was performed by the user on the item. The value must be of the DOUBLE type. For example, the value 1 indicates that the user has purchased the item, and the value 2 indicates that the user has added the item to favorites.
      2 This component imports data from the user feature table, including the following fields:
      • user: the ID of the user. The value must be of the BIGINT type.
      • feature: the one or more features of the user. The value must be of the STRING type. If the user has multiple features, separate them with colons (:). The feature value 0 must be included in the value of feature. Each feature must be indicated by a FLOAT-type number. The system processes the features as continuous features.
      3 This component imports data from the item feature table, including the following fields:
      • item: the ID of the item. The value must be of the BIGINT type.
      • feature: the one or more features of the item. The value must be of the STRING type. If the item has multiple features, separate them with colons (:). The feature value 0 must be included in the value of feature. Each feature must be indicated by a FLOAT-type number. The system processes the features as continuous features.
      4 This component generates a user vector table and an item vector table for matching recall.
  3. Run the experiment and view the result.
    1. In the top toolbar of the canvas, click Run.
    2. After the experiment is run, right-click graphSage-1 on the canvas and choose View Data > View Output Port 1. In the dialog box that appears, view the feature vectors that are generated for users.
    3. Right-click graphSage-1 on the canvas and choose View Data > View Output Port 2. In the dialog box that appears, view the feature vectors that are generated for items.