Dynamic Sorting Based on Heterogeneous Content Streams

Search engines play an extremely important role in the field of e-commerce, and they can well guide users' potential purchase behavior. Traditional e-commerce search engines usually refer to product search engines. The user inputs a query and returns a product list. However, with the development of self-media, more and more users are more willing to share their shopping experience, and they present their views in the form of articles, reviews and videos. In this article, these are collectively referred to as content streams. In order to provide users with more shopping assistance, content search engines emerge as the times require. When users search for products, recommend high-quality content streams to users, and help users choose products that they like and that users may like.

▌Research background:

At present, there are still many challenges in the sorting of heterogeneous data. First of all, the cross-domain knowledge provided by commodity search engines and content search engines should be fully utilized, so that users' behavior preferences in commodity search engines can be applied to content search engines. Second, existing algorithms need to support sorting of multimedia content.

In this paper, our goal is to solve the problem of ranking heterogeneous data in commodity search engines and content search engines, and recommend rich and personalized content streams to users. We divide the algorithm into two parts: 1) Heterogeneous content stream type sorting, that is, to determine what type of content stream each slot displays, whether articles, videos or product lists; 2) Homogeneous content stream content sorting, the second The first step uses the well-known DSSM model. Under this content stream type, the content of the content stream is sorted and the content with the highest similarity is selected for insertion. This article mainly focuses on the first step.

▌ The proposed algorithm:

In this paper, two algorithms are proposed for sorting content stream types, an independent multi-armed bandit algorithm and a personalized Markov deep neural network algorithm.

In the independent multi-armed bandit algorithm, we need to calculate a ratio θ, which is calculated from ipv and pv. If θ is higher, it means that when users see this content stream in the search list, they are more likely to click. For each search pit, we first calculate a prior distribution of θ, here we use Beta distribution, where i represents post, list, video. Represents historical ipv click data of type i, and represents historical browsing data. The expectation is that the posterior probability distribution is updated by a real-time streaming data task. Expressed as the following probability formula:

In this way, the content stream types of all pits are independent, and the pseudocode is as follows:

The selection of dependent heterogeneous data stream types is determined by three factors: user, query, and previous pit type. First of all, under the same query, users can express different preferences. For example, if a user searches for "dress", a certain user may prefer the introduction of the article, and another user may prefer the introduction of the video. Moreover, no user likes a single type of display, and more or less likes the arrangement of diversified content stream types. For the same query, different sorting results should be shown to different users. Our proposed personalized Markov deep neural network algorithm consists of two steps, including user and query representation task learning and pit type prediction learning.

Low-dimensional representation of users and queries We build a graph that includes users, queries and content. Use node2vec to learn the embedding of users and queries, as shown in the figure below:

The middle part of the figure is the embedding representation of the training node. The input layer is a one-hot encoding of the nodes. The weight matrix W is the embedding of all nodes, which maps the node one-hot encoding into a D-dimensional space.

Pit Type Prediction Our objective function is defined as

Where X represents the characteristics of the input i-th pit. In order to simplify our pMDNN model and speed up the operation, we only use the information of the pit before the current predicted pit. However, this brings up a problem, how to predict the type of the first pit, here we use cross-domain knowledge, we extract from the baby information that the user recently browsed in the product search engine and map it to the feature of the content search pit , so that it meets the input requirements of the current model. The input layer of our model is the user's embedding, query embedding and the embedding of the previous pit. It can be expressed as

Three fully connected layers are connected to the input layer. Each layer uses a linear classifier and cross-entropy as the loss function. The activation function selects Relu, and the output layer applies Softmax as the activation function.

▌ Experimental results:

We deployed the proposed model to an A/B testing bucketing environment, and selected 5 main metrics to compare the two models iMAB and pMDNN. pv represents the number of displayed content; pvclick represents the number of clicks on the displayed content; uv represents how many users use the content search engine, uv click represents how many users click on the content stream; as for uv ctr, it represents the ratio of whether the user clicks on the content stream .

The following table shows the experimental results, in which the experimental results of pMDNN are better than that of iMAB. Especially uv click and uv ctr, which are very important for our scenario, because the growth of uv click means that more users prefer content search engines because they can help them shop better. At the same time, the growth of UV CTR shows that users who use content search engines recognize the content streams we recommend. As for the improvement of pv click, it also means that the model we proposed is more in line with the individual needs of users.

Based on pv click and uv ctr, we can think that pMDNN applies cross-domain knowledge and global optimization of multi-pit types is indeed better than iMAB with independent pits.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us