Use Bipartite GraphSAGE for matching recall

This topic describes how to use Bipartite Graph SAmple and aggreGatE (GraphSAGE) to obtain feature vectors of users and items for matching recall.

Background information

Graph neural network is a widely discussed concept in deep learning. The open source Graph-Learn framework of Platform for AI (PAI) provides a large number of graph learning algorithms. GraphSAGE is a matching algorithm for graph neural networks. Bipartite GraphSAGE is an extension of GraphSAGE and is used to process bipartite graphs. Bipartite GraphSAGE is used by Taobao for matching recall.

In a bipartite graph, each user or item is represented by a vertex. The correlation, such as clicking or purchasing, between a user and an item is represented by an edge. The system samples adjacent vertices of each vertex that represents a user and each vertex that represents an item based on the meta paths User-Item-User-Item... and Item-User-Item-User….

Prerequisites

A workspace is created. For more information, see Create a workspace.
MaxCompute resources are associated with the workspace. For more information, see Manage workspaces.

Go to the Machine Learning Designer page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
3. In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer) to go to the Machine Learning Designer page.

Create a pipeline.

On the Visualized Modeling (Designer) page, click the Preset Templates tab.
On the Preset Templates tab, click Create in the RecSys-GraphEmbedding section.
In the Create Pipeline dialog box, configure the parameters. You can use their default values.
The value specified for the Pipeline Data Path parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.
Click OK.
It requires about 10 seconds to create the pipeline.
On the pipelines tab, double-click the RecSys-GraphEmbedding pipeline to open the pipeline.

View the components of the pipeline on the canvas as shown in the following figure. The system automatically creates the pipeline based on the preset template.

图神经网络召回

Area	Description
①	The node imports data from the table that records user behavior on items. The table contains the following fields: user: the ID of the user. The value must be of the BIGINT type. item: the ID of the item. The value must be of the BIGINT type. weight: the behavior that was performed by the user on the item. The value must be of the DOUBLE type. For example, a value of 1 indicates that the user purchased the item, and a value of 2 indicates that the user added the item to favorites.
②	The node imports data from the user feature table. The table contains the following fields: user: the ID of the user. The value must be of the BIGINT type. feature: the one or more features of the user. The value must be of the STRING type. Each user must be added with at least one feature. Separate multiple features with colons (:). Each feature must be indicated by a FLOAT-type number. The system processes the features as continuous features. Example: `1:1:1`.
③	The node imports data from the item feature table. The table contains the following fields: item: the ID of the item. The value must be of the BIGINT type. feature: the one or more features of the item. The value must be of the STRING type. Each item must be added with at least one feature. Separate multiple features with colons (:). Each feature must be indicated by a FLOAT-type number. The system processes the features as continuous features. Example: `1:1:2`.
④	The node generates a user vector table and an item vector table for matching recall.

Run the pipeline and view the results.
1. In the upper-left corner of the canvas, click the Run icon.
2. After you run the pipeline, right-click the graphSage component on the canvas and choose View Data > user_embedding. In the dialog box that appears, view the feature vectors that are generated for users.
3. After you run the pipeline, right-click the graphSage component on the canvas and choose View Data > item_embedding. In the dialog box that appears, view the feature vectors that are generated for items.