Innovation and Practice of Dynamic Network Representation Learning in the Field of Recommendation
At present, most Graph Embedding methods, such as node2vec, GCN, GraphSAGE and GAT, are mainly for static graph models, that is, assume that the graph will not change. But in real life, the relationship diagram often changes with time. For example, users' interest in dresses will gradually shift to high-heeled shoes over a period of time. In the relationship diagram, users' neighbors are mainly dresses in the previous period, but users' neighbors are mainly high-heeled shoes in the later period. If the modeling method of static graph is followed, the dress and high-heeled shoes will appear on the same graph. Although time information can be added to the relationship edge, it can only be used as the weight to control the walk or aggregation, and can not clearly model the user's interest changes in time sequence.
Dynamic network representation learning is also called Dynamic Graph Embedding (Dynamic Network Embedding). It can not only learn the structure information of the current network, but also learn the changes of the network in time. It is a popular direction of Graph Embedding at present. In recent years, algorithms related to dynamic network representation learning have also been proposed like mushrooms, such as DynamicTriad, DySAT, etc. But at present, it is mainly aimed at dynamic isomorphic networks. Inspired by DySAT and HAN algorithms, we propose a dynamic graph representation algorithm (DyHAN) based on hierarchical attention mechanism in dynamic heterogeneous networks, which is superior to the existing methods in offline evaluation.
In terms of business landing, considering the development difficulty and online exposure, we introduced dynamic models to better learn timing information based on the previous GraphSAGE i2i, and achieved certain results in business.
At the innovation level, we propose a dynamic graph representation algorithm based on hierarchical attention mechanism (DyHAN, short for Dynamic Heterogeneous Graph Embedding using Hierarchical Attention) for dynamic heterogeneous graphs of users and goods.
Here we mainly describe the construction information of our own dataset, and other datasets are similar. We use the user history behavior log to build.
Node type: user and product
Edge type: click, inquiry (AB), order, etc.
Time slicing: The user behavior of each day is taken as a time slicing. The 10-day time slicing is used for training and the 11-day time slicing is used for evaluation.
Node information: node id (here, in order to facilitate the experiment, the node id feature is directly used. If there are other features in the graph, they can also be added).
The graph thus constructed is a heterogeneous dynamic graph with 11 time slices, 2 node types and 3 edge types. Under a time slice, if we only look at the edges of the click type, we will get the subgraph of the click type under the time slice.
The algorithm is mainly a three-layer attention mechanism, and the model structure is shown in Figure 1. The fusion of the three layers here is the aggregation at the node level, edge level and time sequence level. In DyHAN, we all use the attention mechanism, but in fact, these three modules can be replaced by other aggregation methods. For example, the node level aggregation can use the mean, mean-pooling and max-pooling methods of GraphSAGE. Time level aggregation can use RNN class methods, such as LSTM and GRU.
The purpose of node-level aggregation is to make an attachment for each node in the sub-graph of each edge type under each time slice to fuse the information of itself and its neighbors. In this way, the fused vector can represent the semantic information of this node under this kind of edge type. Query is the node itself, and key is the node's neighbors (including itself).
Edge-level aggregation is mainly used to aggregate the edge type vectors of nodes under each time slice. A certain type of edge type vector may contribute a lot to this node. For example, the edge type vector of order contributes a lot to the traded products.
Temporal-level aggregation mainly aggregates the node vectors on each time slice. A standard Scaled-Dot-Product Attention is used here. M is a mask matrix, mainly to make the node vector only see the past node vector.
In order to increase the ability of express, the multi-head mechanism can be used for aggregation at each layer.
Cross Entropy is selected as the loss function. Positive and negative samples are only selected on the last time slice.
The recommended quantitative recall is to calculate the similarity of two nodes to predict the products that users are potentially interested in. In order to better fit the business, we choose edge prediction as our experimental task. Baseline selects representative algorithms for static and dynamic charts, such as DeepWalk, Metapath2vec, GraphSAGE and GAT; Dynamic ones mainly include DynamicTriad, DySAT, etc. We have carried out experiments on two public data sets and our own data sets. The following are the experimental results.
Considering that the current mechanism of ICBU recommendation engine has a low exposure ratio to user vectorization, we prefer to try on i2i in the online landing of dynamic graph representation, that is, directly introduce dynamic models on the basis of GraphSAGE i2i.
The construction of each time slice graph follows the previous GraphSAGE i2i pattern. Due to the mechanism of engineering implementation, only the node vector on the last time slice is transferred. Therefore, in order not to reduce the product coverage, each time slice is set to 90 days, and the appropriate overlay is set between time slices.
In this way, the model actually becomes a simplified version of DyHAN detailed in the previous article. It is divided into two layers. The first layer is the aggregation mechanism of GraphSAGE, which mainly calculates the node vector for each time slice. The second layer is the aggregation at the time sequence level, using the Scaled-Dot-Product Attention described above. The last training is unsupervised training. The selection of unsupervised learning samples has been optimized, and the loss function uses Triplet Loss.
Offline evaluation and online effect
Offline evaluation: we randomly select the first item under the session as the trigger, and calculate the coverage rate of the different items clicked in the same session. Ten thousand such samples are selected_ The coverage increment of i2i is 4.2%, using dynamic_ The coverage increment of i2i is 10.9%.
Online effect: the L-AB conversion rate increased by 3.54% and the L-O conversion rate increased by 14.23% when the Detail cross-store recommendation was launched. In terms of the conversion of the overall Detail page, the D-AB conversion rate increased by 0.85% and the D-O conversion rate increased by 2.57%.
In the study of dynamic chart features, we innovatively proposed the DyHAN method for heterogeneous dynamic chart modeling, and introduced the dynamic chart feature model in the recommendation field of Alibaba International Station (ICBU), which has achieved certain results in business. At the same time, we also found that the dynamic graph representation mode of time slicing is relatively expensive, because each time slicing needs to run a static graph representation model, and how to reduce the computational cost is a future research direction. At the same time, how to better integrate temporal information in the time dimension is also a research direction in the future.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00