A Collaborative Estimator of Click and Dwell
▌Abstract
JUMP uses a novel three-layer RNN structure to encode a user's session, including using a "fast and slow layer" to alleviate the problem of short sessions, and using an "attention layer" to solve the problem of session noise. A large number of experiments have shown that the JUMP algorithm can surpass other latest algorithms in terms of click-through rate prediction and stay duration prediction.
▌Estimated length of stay
For the estimation of the dwell time, we borrowed the idea of "Survival Analysis" (Survival Analysis) to approximate the time that the user stays on a content through the time of occurrence; conceptually, the dwell time is considered as "leaving the current content". "The time of the incident. If we record the sample of the user's stay time as O, suppose there is a function that maps O to a simple distribution f:
Here f may be a Gaussian distribution, a Gamma distribution, etc. It can be proved that:
where F denotes the cumulative probability distribution (CDF) and T is an online approximation to O. Then with the above formula, we can use maximum likelihood estimation to estimate the residence time of the sample. We analyzed the RecSys15 data and the dwell time on it. The original distribution is shown in the lower left figure. However, after taking the log of the dwell time, we got the lower right figure. We are pleased to find that after taking the log, the residence time is in line with the normal distribution.
So we take g as the log function, f as the normal distribution function, and finally we can get the likelihood function:
▌Learning Modeling
We record a user's session as a behavior sequence: 1.jpg. Among them, i represents the j-th clicked item, indicating the dwell time, but a bool type, indicating whether it is the last item of the session (or the dwell time is too long). We assume that the sample is sampled from a distribution P(S), then the distribution can be decomposed into a product of 2 terms.
The former item represents the likelihood of the click rate, and the latter represents the likelihood of the dwell time; it represents the behavior before the kth click in the session. In order to extract more effective information from conversational behavior, we propose a three-layer RNN model to encode.
Attention Layer: The main purpose of our attention mechanism is to remove the noise in the session and retain the real and effective information. Represents the output of the fast-slow layer, a corresponds to the weight of attention, and the corresponding calculation method is as follows:
Fast-SlowLayer: The fast-slow layer proposes a novel RNN network. When we process the jth input, we will enter an F-S processing unit, which includes a slow element and a fast element sequence. The slow element will store more long-term memories, and the fast element will acquire more current information. The F-S processing unit is shown in the figure below:
EmbeddingLayer: The bottom layer of the network is an embedding layer, which maps a two-tuple to a vector, and we use batch-normalizing to normalize the input:
▌Experiment
In order to verify the performance of the algorithm, we compared the JUMP algorithm with the mainstream session-based prediction algorithm to verify the effect on click-through rate prediction and dwell time prediction. The compared algorithms include GRU, IGRU, NARM, DTGRU, RMTP, ATRP, NSR, and the data sets used are RecSys15, CIKM16 and REDDIT.
The results of click-through rate estimation are shown in the table below. We can see that the JUMP algorithm can surpass other algorithms on all data sets and obtain the best Recall, MRR, and NDCG indicators.
At the same time, we observed the impact of the embedding dimension on the recall indicator, and the results are shown in the figure below. We can see that with the improvement of the embedding dimension, the recall index can be increased to a certain extent, but basically reaches the highest at around 100 dimensions; secondly, we can still find that the effect of the JUMP algorithm is better than other algorithms, blue Color curves are always on top of other curves.
In addition to the click-through rate, we observed the effect on the task of time estimation, and the results are shown in the table below. Compared with ATRP, RMTP and NSR algorithms, JUMP can significantly improve the accuracy of prediction.
This paper proposes a novel algorithm, JUMP, to simultaneously estimate the click-through rate and dwell time in a session. There are three important contributions: 1. Use the method of survival analysis to model the user's stay time, based on reliable data theory; 2. A three-layer RNN structure is proposed, which can improve the model based on the attention mechanism 3. A newly designed fast-slow structure is used to enhance the learning ability of short conversations. A large number of experiments have verified the effectiveness of the algorithm, and the results on multiple public data sets have been greatly improved compared with other algorithms.
JUMP uses a novel three-layer RNN structure to encode a user's session, including using a "fast and slow layer" to alleviate the problem of short sessions, and using an "attention layer" to solve the problem of session noise. A large number of experiments have shown that the JUMP algorithm can surpass other latest algorithms in terms of click-through rate prediction and stay duration prediction.
▌Estimated length of stay
For the estimation of the dwell time, we borrowed the idea of "Survival Analysis" (Survival Analysis) to approximate the time that the user stays on a content through the time of occurrence; conceptually, the dwell time is considered as "leaving the current content". "The time of the incident. If we record the sample of the user's stay time as O, suppose there is a function that maps O to a simple distribution f:
Here f may be a Gaussian distribution, a Gamma distribution, etc. It can be proved that:
where F denotes the cumulative probability distribution (CDF) and T is an online approximation to O. Then with the above formula, we can use maximum likelihood estimation to estimate the residence time of the sample. We analyzed the RecSys15 data and the dwell time on it. The original distribution is shown in the lower left figure. However, after taking the log of the dwell time, we got the lower right figure. We are pleased to find that after taking the log, the residence time is in line with the normal distribution.
So we take g as the log function, f as the normal distribution function, and finally we can get the likelihood function:
▌Learning Modeling
We record a user's session as a behavior sequence: 1.jpg. Among them, i represents the j-th clicked item, indicating the dwell time, but a bool type, indicating whether it is the last item of the session (or the dwell time is too long). We assume that the sample is sampled from a distribution P(S), then the distribution can be decomposed into a product of 2 terms.
The former item represents the likelihood of the click rate, and the latter represents the likelihood of the dwell time; it represents the behavior before the kth click in the session. In order to extract more effective information from conversational behavior, we propose a three-layer RNN model to encode.
Attention Layer: The main purpose of our attention mechanism is to remove the noise in the session and retain the real and effective information. Represents the output of the fast-slow layer, a corresponds to the weight of attention, and the corresponding calculation method is as follows:
Fast-SlowLayer: The fast-slow layer proposes a novel RNN network. When we process the jth input, we will enter an F-S processing unit, which includes a slow element and a fast element sequence. The slow element will store more long-term memories, and the fast element will acquire more current information. The F-S processing unit is shown in the figure below:
EmbeddingLayer: The bottom layer of the network is an embedding layer, which maps a two-tuple to a vector, and we use batch-normalizing to normalize the input:
▌Experiment
In order to verify the performance of the algorithm, we compared the JUMP algorithm with the mainstream session-based prediction algorithm to verify the effect on click-through rate prediction and dwell time prediction. The compared algorithms include GRU, IGRU, NARM, DTGRU, RMTP, ATRP, NSR, and the data sets used are RecSys15, CIKM16 and REDDIT.
The results of click-through rate estimation are shown in the table below. We can see that the JUMP algorithm can surpass other algorithms on all data sets and obtain the best Recall, MRR, and NDCG indicators.
At the same time, we observed the impact of the embedding dimension on the recall indicator, and the results are shown in the figure below. We can see that with the improvement of the embedding dimension, the recall index can be increased to a certain extent, but basically reaches the highest at around 100 dimensions; secondly, we can still find that the effect of the JUMP algorithm is better than other algorithms, blue Color curves are always on top of other curves.
In addition to the click-through rate, we observed the effect on the task of time estimation, and the results are shown in the table below. Compared with ATRP, RMTP and NSR algorithms, JUMP can significantly improve the accuracy of prediction.
This paper proposes a novel algorithm, JUMP, to simultaneously estimate the click-through rate and dwell time in a session. There are three important contributions: 1. Use the method of survival analysis to model the user's stay time, based on reliable data theory; 2. A three-layer RNN structure is proposed, which can improve the model based on the attention mechanism 3. A newly designed fast-slow structure is used to enhance the learning ability of short conversations. A large number of experiments have verified the effectiveness of the algorithm, and the results on multiple public data sets have been greatly improved compared with other algorithms.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00