Ali launched a new sorting model
Background
Taking the optimization of user clicks in ecommerce scenarios as an example, the task of the recommendation system is to select the products that users are most interested in and most likely to click from a large number of candidate products. In order to improve the efficiency of retrieval, retrieval is usually divided into two stages. In the phase of recall/candidate generation (Matching/Candidate Generation), a small number of candidate products (such as 1000) are screened out from the entire candidate set based on U2I correlation, and collaborative filtering methods are commonly used. In the ranking stage, the CTR of the small number of candidate products is estimated according to the ranking model, and then displayed to the user after sorting.
The importance of CTR estimation in the recommendation system is selfevident, and personalization is the key to improving the effect of the CTR model. This article introduces a brandnew ranking model. The main idea is to integrate the idea of collaborative filtering in Match, and represent the correlation of U2I in the Rank model, so as to improve the personalization ability of the model and achieve good results.
In the search scenario, the user explicitly expresses the user's intention by entering search words, but there is no such way to explicitly obtain the user's intention in the recommendation scenario. The user's intention is often hidden in the user behavior sequence. It can be said that the user behavior sequence is the query in the recommendation. Therefore, it is very important to model user behavior sequences to extract user intentions. Subsequent works such as DIN[1] and DIEN[2] focus on the representation of user interests to improve the effect of the model, and our work takes a step forward on this basis, focusing on the representation of U2I correlation. U2I correlation can directly measure the user's preference strength for the target item. It can be understood as an upgrade from user features (user interest characterization) to U2I crossfeatures (U2I correlation characterization).
Characterizing U2I correlation is easy to think of collaborative filtering (CF) in recall. I2I CF is the most common method in the industry. It precalculates the similarity of I2I, and then indirectly obtains the U2I correlation according to the user's behavior and the similarity of I2I. The method of factorization is more direct, and the U2I correlation is directly obtained through the inner product of the user representation and the product representation. Here, this method is called U2I CF for the time being. Recently, some deep learning methods have entered related fields: for example, NAIS[7] in I2I CF uses the attention mechanism to distinguish the importance of user behavior, which is similar to DIN[1]; DNN4YouTube[3] in U2I CF, Model recall as a largescale multiclassification problem, which is often referred to as DeepMatch. DeepMatch can be seen as a nonlinear generalization of factorization technology. We construct two subnetworks respectively according to U2I CF and I2I CF in collaborative filtering to characterize U2I correlation.
Model introduction
The network structure of the DMR (Deep Match to Rank) model is shown in the figure. It is difficult to capture the U2I correlation only relying on the implicit feature crossing of MLP. For the U2I crossfeatures input into MLP, in addition to the manually constructed U2I crossfeatures, we use the UsertoItem subnetwork and the ItemtoItem subnetwork to represent the U2I correlation to further improve the expressive ability of the model.
UsertoItem Networking
Inspired by the factorization method, we use the inner product of user representation and item representation to represent U2I correlation, which can be regarded as an explicit feature cross. User representation is obtained based on user behavior characteristics. A simple method is to do average pooling, that is, to regard each behavior characteristic as equally important. Considering the importance of contextual features such as behavioral time to distinguish behaviors, we adopt the attention mechanism and use contextual features such as positional encoding (refer to Transformer[4]) as queries to adaptively learn the weight of each behavior. Wherein, the number of the positionencoded behavior sequence arranged in chronological order expresses the distance of the behavior time. The formula is as follows:
Among them is the tth position embedding, is the feature vector of the tth user behavior, , is the learning parameter, and is the normalized weight of the tth user behavior. Through weighted sum pooling, a fixedlength feature vector is obtained, and then the user representation is obtained through a nonlinear change through the fully connected layer to match the dimension of the item representation. The final user representation can be defined as:
Among them, the function represents a nonlinear transformation, the input dimension , and the output dimension , are the weighted feature vectors of the tth user behavior.
The target item representation is directly obtained through embedding lookup. This embedding matrix is a separate matrix for the output, which is different from the embedding matrix V used by the item at the input (similar to a word in word2vec[6], which has two representations of input and output. ). With user representation and item representation, we use the inner product to represent U2I correlation:
We hope that the larger r means the stronger the correlation, which has a positive effect on CTR prediction. Then from the perspective of backpropagation, it is difficult to learn such an effect only through the supervision of clicking the label. In addition, the learning of the embedding matrix completely depends on the unique correlation unit r. Based on the above two points, we proposed the DeepMatch network (that is, the Auxiliary Match Network on the last side of the figure), introducing user behavior as a label to supervise the learning of the UsertoItem network.
The task of the DeepMatch network is to predict the Tth behavior based on the previous T−1 behaviors. It is a largescale multiclassification task, and there are multiple classifications for as many candidate products as there are. According to the form of user representation above, we can obtain the user representation corresponding to the top T−1 user behaviors, denoted as . After the user takes these T−1 actions, the probability of clicking on item j next can be defined by the softmax function:
where the (output) representation of the jth commodity is . The output representation of the target product is actually the parameter of the softmax layer. With crossentropy as the loss function, we have the following loss:
Among them, represents the label of the jth product of the Ith sample, is the corresponding prediction result, and K is the number of different categories, that is, the number of products. If and only if product j is the Tth behavior in the user behavior sequence. Considering that the calculation amount of softmax is too large, which is proportional to the total number of goods K, the negative sampling method is used to simplify the calculation, and the loss becomes the following form:
Among them is the sigmoid function, is a positive sample, is a negative sample, and k is the number of negative samples used, which is much smaller than the total number of goods K. The loss of DeepMatch will be added to the final classification loss of MLP. The DeepMatch network will promote a larger inner product r to represent a stronger correlation, thereby helping the training of the model. In fact, the UsertoItem Network is a joint training of the Ranking model and the Matching model in a unified manner. This is different from simply adding features such as match_type and match_score in the recall phase to the ranking model. The recall stage is usually multichannel recall, and the scores of different recall methods are not under the same metric, so they cannot be directly compared (for example, the scores of swing and DeepMatch cannot be directly compared). DMR can characterize the U2I correlation for any given target product through the UsertoItem network, and can be compared with each other.
ItemtoItem Networking
The UsertoItem network directly characterizes the U2I correlation through the inner product, while the ItemtoItem network indirectly characterizes the U2I correlation by calculating the I2I similarity. Recall the target attention in models such as DIN[1], that is, use the target product as the query to make attention on the user behavior sequence to distinguish the importance of the behavior. We can understand it as an I2I similarity calculation, and user behavior products that are more similar to the target product get higher weights, thereby dominating the feature vector after pooling. Based on this understanding, we sum all the weights (before softmax normalization) to get another U2I correlation expression. The formula is as follows:
The ItemtoItem network is calculated in the form of additive attention [5], which is different from the inner product form of UsertoItem, which can enhance the representation ability.
In addition to the U2I correlation representation, the ItemtoItem network also inputs the user representation after target attention into the MLP. If DMR has no U2I correlation representation and positional encoding, it is basically the same as the DIN[1] model.
experiment
We conducted a series of experiments on Alimama's public dataset and the production dataset recommended by 1688 to verify the overall effect of the model and explore the impact of a certain module on the model.
offline experiment
online experiment
We recommend the online DMR model for you on 1688. The comparison model is DIN[1] (our previous version of the CTR model). The relative increase in CTR is 5.5%, and the relative increase in DPV is 12.8%. Currently, it has been fully implemented.
Summary & Outlook
Our paper Deep Match to Rank Model for Personalized ClickThrough Rate Prediction was accepted by AAAI20 in the form of oral, the original paper address: https://github.com/lvze92/DMR
DMR provides a framework for joint training of Matching and Ranking. The module of U2I correlation representation can be easily embedded into the existing CTR model, which is equivalent to adding some effective features to your original model. Our subsequent CTR model iterations will continue to add new improvements based on the DMR framework.
Taking the optimization of user clicks in ecommerce scenarios as an example, the task of the recommendation system is to select the products that users are most interested in and most likely to click from a large number of candidate products. In order to improve the efficiency of retrieval, retrieval is usually divided into two stages. In the phase of recall/candidate generation (Matching/Candidate Generation), a small number of candidate products (such as 1000) are screened out from the entire candidate set based on U2I correlation, and collaborative filtering methods are commonly used. In the ranking stage, the CTR of the small number of candidate products is estimated according to the ranking model, and then displayed to the user after sorting.
The importance of CTR estimation in the recommendation system is selfevident, and personalization is the key to improving the effect of the CTR model. This article introduces a brandnew ranking model. The main idea is to integrate the idea of collaborative filtering in Match, and represent the correlation of U2I in the Rank model, so as to improve the personalization ability of the model and achieve good results.
In the search scenario, the user explicitly expresses the user's intention by entering search words, but there is no such way to explicitly obtain the user's intention in the recommendation scenario. The user's intention is often hidden in the user behavior sequence. It can be said that the user behavior sequence is the query in the recommendation. Therefore, it is very important to model user behavior sequences to extract user intentions. Subsequent works such as DIN[1] and DIEN[2] focus on the representation of user interests to improve the effect of the model, and our work takes a step forward on this basis, focusing on the representation of U2I correlation. U2I correlation can directly measure the user's preference strength for the target item. It can be understood as an upgrade from user features (user interest characterization) to U2I crossfeatures (U2I correlation characterization).
Characterizing U2I correlation is easy to think of collaborative filtering (CF) in recall. I2I CF is the most common method in the industry. It precalculates the similarity of I2I, and then indirectly obtains the U2I correlation according to the user's behavior and the similarity of I2I. The method of factorization is more direct, and the U2I correlation is directly obtained through the inner product of the user representation and the product representation. Here, this method is called U2I CF for the time being. Recently, some deep learning methods have entered related fields: for example, NAIS[7] in I2I CF uses the attention mechanism to distinguish the importance of user behavior, which is similar to DIN[1]; DNN4YouTube[3] in U2I CF, Model recall as a largescale multiclassification problem, which is often referred to as DeepMatch. DeepMatch can be seen as a nonlinear generalization of factorization technology. We construct two subnetworks respectively according to U2I CF and I2I CF in collaborative filtering to characterize U2I correlation.
Model introduction
The network structure of the DMR (Deep Match to Rank) model is shown in the figure. It is difficult to capture the U2I correlation only relying on the implicit feature crossing of MLP. For the U2I crossfeatures input into MLP, in addition to the manually constructed U2I crossfeatures, we use the UsertoItem subnetwork and the ItemtoItem subnetwork to represent the U2I correlation to further improve the expressive ability of the model.
UsertoItem Networking
Inspired by the factorization method, we use the inner product of user representation and item representation to represent U2I correlation, which can be regarded as an explicit feature cross. User representation is obtained based on user behavior characteristics. A simple method is to do average pooling, that is, to regard each behavior characteristic as equally important. Considering the importance of contextual features such as behavioral time to distinguish behaviors, we adopt the attention mechanism and use contextual features such as positional encoding (refer to Transformer[4]) as queries to adaptively learn the weight of each behavior. Wherein, the number of the positionencoded behavior sequence arranged in chronological order expresses the distance of the behavior time. The formula is as follows:
Among them is the tth position embedding, is the feature vector of the tth user behavior, , is the learning parameter, and is the normalized weight of the tth user behavior. Through weighted sum pooling, a fixedlength feature vector is obtained, and then the user representation is obtained through a nonlinear change through the fully connected layer to match the dimension of the item representation. The final user representation can be defined as:
Among them, the function represents a nonlinear transformation, the input dimension , and the output dimension , are the weighted feature vectors of the tth user behavior.
The target item representation is directly obtained through embedding lookup. This embedding matrix is a separate matrix for the output, which is different from the embedding matrix V used by the item at the input (similar to a word in word2vec[6], which has two representations of input and output. ). With user representation and item representation, we use the inner product to represent U2I correlation:
We hope that the larger r means the stronger the correlation, which has a positive effect on CTR prediction. Then from the perspective of backpropagation, it is difficult to learn such an effect only through the supervision of clicking the label. In addition, the learning of the embedding matrix completely depends on the unique correlation unit r. Based on the above two points, we proposed the DeepMatch network (that is, the Auxiliary Match Network on the last side of the figure), introducing user behavior as a label to supervise the learning of the UsertoItem network.
The task of the DeepMatch network is to predict the Tth behavior based on the previous T−1 behaviors. It is a largescale multiclassification task, and there are multiple classifications for as many candidate products as there are. According to the form of user representation above, we can obtain the user representation corresponding to the top T−1 user behaviors, denoted as . After the user takes these T−1 actions, the probability of clicking on item j next can be defined by the softmax function:
where the (output) representation of the jth commodity is . The output representation of the target product is actually the parameter of the softmax layer. With crossentropy as the loss function, we have the following loss:
Among them, represents the label of the jth product of the Ith sample, is the corresponding prediction result, and K is the number of different categories, that is, the number of products. If and only if product j is the Tth behavior in the user behavior sequence. Considering that the calculation amount of softmax is too large, which is proportional to the total number of goods K, the negative sampling method is used to simplify the calculation, and the loss becomes the following form:
Among them is the sigmoid function, is a positive sample, is a negative sample, and k is the number of negative samples used, which is much smaller than the total number of goods K. The loss of DeepMatch will be added to the final classification loss of MLP. The DeepMatch network will promote a larger inner product r to represent a stronger correlation, thereby helping the training of the model. In fact, the UsertoItem Network is a joint training of the Ranking model and the Matching model in a unified manner. This is different from simply adding features such as match_type and match_score in the recall phase to the ranking model. The recall stage is usually multichannel recall, and the scores of different recall methods are not under the same metric, so they cannot be directly compared (for example, the scores of swing and DeepMatch cannot be directly compared). DMR can characterize the U2I correlation for any given target product through the UsertoItem network, and can be compared with each other.
ItemtoItem Networking
The UsertoItem network directly characterizes the U2I correlation through the inner product, while the ItemtoItem network indirectly characterizes the U2I correlation by calculating the I2I similarity. Recall the target attention in models such as DIN[1], that is, use the target product as the query to make attention on the user behavior sequence to distinguish the importance of the behavior. We can understand it as an I2I similarity calculation, and user behavior products that are more similar to the target product get higher weights, thereby dominating the feature vector after pooling. Based on this understanding, we sum all the weights (before softmax normalization) to get another U2I correlation expression. The formula is as follows:
The ItemtoItem network is calculated in the form of additive attention [5], which is different from the inner product form of UsertoItem, which can enhance the representation ability.
In addition to the U2I correlation representation, the ItemtoItem network also inputs the user representation after target attention into the MLP. If DMR has no U2I correlation representation and positional encoding, it is basically the same as the DIN[1] model.
experiment
We conducted a series of experiments on Alimama's public dataset and the production dataset recommended by 1688 to verify the overall effect of the model and explore the impact of a certain module on the model.
offline experiment
online experiment
We recommend the online DMR model for you on 1688. The comparison model is DIN[1] (our previous version of the CTR model). The relative increase in CTR is 5.5%, and the relative increase in DPV is 12.8%. Currently, it has been fully implemented.
Summary & Outlook
Our paper Deep Match to Rank Model for Personalized ClickThrough Rate Prediction was accepted by AAAI20 in the form of oral, the original paper address: https://github.com/lvze92/DMR
DMR provides a framework for joint training of Matching and Ranking. The module of U2I correlation representation can be easily embedded into the existing CTR model, which is equivalent to adding some effective features to your original model. Our subsequent CTR model iterations will continue to add new improvements based on the DMR framework.
Related Articles

A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team

What Does IOT Mean
Knowledge Base Team

6 Optional Technologies for Data Storage
Knowledge Base Team

What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers

Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00