Developer Content

Summary

On e-commerce platforms such as Taobao and Tmall, merchants usually write redundant product titles for SEO purposes, especially in scenarios where display space is limited on the APP side, too long product titles are often not fully displayed , can only be truncated, which seriously affects the user experience. How to compress the original product title to a limited length without affecting the overall transaction is a very challenging task.

The previous title summarization methods often require a lot of manual preprocessing, which is costly, and does not consider the special needs of click-through rate, conversion rate and other indicators in the e-commerce scenario. Based on this, we propose a product title compression method using user search logs for multi-task learning.

This method performs two Sequence-to-Sequence learning tasks at the same time: the main task is based on the Pointer Network model to realize the extractive summarization from the original title to the short title, and the auxiliary task is based on the encoder-decoder model with attention mechanism to realize the extraction from the original title Generate a user search query for the corresponding product. The network coding parameters are shared between the two tasks, and the attention distribution of the two on the original title is jointly optimized during the training process, so that the attention of the two tasks on the important information in the original title is as consistent as possible.

Offline manual evaluation and online experiments prove that the product short title generated by the multi-task learning method not only retains the core information in the original product title, but also reveals the user's search query information, ensuring that the transaction conversion is not affected.

Research Background

Product titles are an important medium for sellers and buyers to communicate on e-commerce platforms. Users enter Query at the search entry, browse the product list on the search result page (SRP), select the target product, and finally complete the purchase. In the entire shopping transaction link, various information such as product titles, product descriptions, and product pictures jointly affect the user's purchase decision. A title with rich information but not lengthy can greatly improve the end user experience.

According to the 40th "Statistical Report on Internet Development in China", as of June 2017, the number of mobile Internet users in my country has reached 724 million, and the proportion of Internet users who use mobile phones to access the Internet has increased from 95.1% at the end of 2016 to 96.3%. More and more online purchases have been transferred from the PC side to the wireless side (APP), and the gap between the two is still widening, so the resources of major e-commerce platforms are also tilted towards their respective APPs. The most obvious difference between a PC and an APP is the display screen size. Usually, the display screen of a smartphone is between 4.5 and 5.5 inches, which is much smaller than the screen size of a PC, and there are new requirements for algorithms and product design.

At present, the titles of Taobao products are mainly written by the merchants. In order to improve search recall and promote transactions, merchants often pile up a lot of redundant words in the titles. When users browse on the mobile phone, the too long product titles are limited by the screen size. The display is incomplete and can only be truncated, which seriously affects the user experience.

As shown in Figure 1, on the SRP page, the original title of the product is incomplete, and only a short title of about 14 characters can be displayed. If the user wants to obtain the full title, he needs to click further to enter the product details page. The original title of the product contains nearly 30 characters. words. In addition, in the personalized push and recommendation scenarios, the short title of the product is the main body of the information, and there are certain restrictions on the length. How to use the shortest possible text to reflect the core attributes of the product, arouse users' interest in clicking and browsing, and increase the conversion rate. issues worthy of in-depth study.

Introduction of existing methods
Text summarization (compression) is one of the important research directions in natural language processing. According to the generation method of the summary, it can be divided into two types: extraction type and generation type. As the name implies, the abstract sentences and words generated by the extractive method are extracted from the original text, while the generative method is more flexible, and the sentences and words in the abstract are not required to be extracted from the original text. Traditional extractive summarization methods can be roughly divided into greedy methods, graph-based methods, and constraint-based optimization methods. In recent years, neural network methods have also been applied to the field of text summarization and have made significant progress, especially generative summarization methods. The existing methods in the industry are to compress the length of the article as the optimization goal to realize the summary of the text. In the e-commerce scenario, there are other considerations besides the text compression rate. How to reduce the length of the product title without affecting the overall transaction conversion rate has become an industry challenge. problem.

method introduction

As shown in Figure 2, the multi-task learning method proposed in this paper includes two Sequence-to-Sequence tasks. The main task is to compress product titles, generate short titles from the original product titles, and use the Pointer Network model to select the original titles through the attention mechanism. Keyword output; auxiliary task is search query generation, which is generated from the original title of the product, using an encoder-decoder model with an attention mechanism. The two tasks share the encoding network parameters, and jointly optimize the attention distribution of the two to the original title, so that the attention of the two tasks to the important information in the original title is as consistent as possible. The introduction of auxiliary tasks can help the main task to better retain words that are more informative and more likely to attract users to click from the original title. Correspondingly, we construct training data for two tasks. The data used in the main task is the original title of the product under the category of women's clothing and the short title of the product rewritten by the experts on the recommended channel of Mobile Taobao. The data used in the auxiliary task is the category of women's clothing. The original title of the product and the corresponding user search query that guides the transaction.

main contribution

The multi-task learning method in this paper compresses product titles, and the generated product short titles exceed traditional extractive summarization methods in offline automatic evaluation, manual evaluation, and online evaluation.

The end-to-end training method avoids a lot of manual preprocessing and feature engineering of traditional methods.

The Attention distribution consistency setting in multi-task learning enables the final product short title to reveal important words in the original title, especially the core words that can guide transactions, which is also of great significance to other e-commerce scenarios.

Experimental results

We used the product title data under Taobao women's clothing category to conduct experiments and compared five different text summarization methods. The first is the baseline method, which is directly truncated according to the target length (Trunc.); the second is the classic integer linear programming method (ILP), which requires preprocessing such as word segmentation, NER, and TermWeighting for the title; the third is based on The encoder-decoder extraction method of the Pointer Network experiment (Ptr-Net); the fourth is a multi-task learning method, which directly adds the loss functions of the two subtasks as the overall loss function for optimization (Vanilla-MTL); the fifth It is a multi-task learning method (Agree-MTL) proposed in this paper that considers the consistency of Attention distribution.

Automatic evaluation comparison of different methods

Table 1 compares different text summarization methods by calculating three kinds of ROUGE scores between the generated short title and the reference short title as the automatic evaluation results. The multi-task learning method proposed in this paper significantly outperforms several other methods.

Comparison of human evaluation of different methods

Table 2 shows the comparison of manual evaluation of product short titles generated by different methods. Since the core product words of commodities in the e-commerce scene are relatively sensitive, in addition to the common readability (Readability) and information integrity (Informativeness) indicators, we also compared whether the core product words in the short titles generated by different methods are accurate ( Accuracy). From the results in Table 2, the method proposed in this paper outperforms other methods in all three indicators.

In addition to offline automatic evaluation and manual evaluation, we also conducted an AB test in a real online environment. Compared with the original online ILP compression method, the multi-task learning method proposed in this paper has 2.58 points in CTR and CVR. % and an increase of 1.32%.

Figure 3 shows examples of product short titles generated by different methods. Affected by the preprocessing results, the short titles generated by the direct truncation and ILP baseline methods have poor fluency and readability, while Ptr-Net and multi-task learning belong to the Sequence-to-Sequence method, and the short titles generated are less readable. It is superior to the two baselines in performance. The example on the left side of Figure 3 shows that the short title generated by the method in this paper will reveal the words that appear in the user's high-frequency search query (users often use English brand names instead of Chinese brand names in the search query), which is easier to promote transactions.

Summarize

Due to the excessive SEO of merchants, the product titles of C2C e-commerce platforms are usually too long and redundant, and cannot be fully displayed on the APP side. To solve this problem, this paper uses the extractive summarization method to compress the overly long product titles.

The traditional summarization method only realizes the compression of the title while maintaining the semantics of the original title, and does not consider the impact on the click-through rate and transaction conversion rate of the compressed product in the e-commerce scenario. The e-commerce platform has accumulated a large number of user search queries and product transaction information. Using this part of data, we can compress the original long title in a more targeted manner.

Therefore, we propose a multi-task learning title compression method, which includes two sequence learning subtasks: the main task is the extractive summary generation from the original title to the short title based on the Pointer Network model, and the auxiliary task is based on The encoder-decoder model of the Attention mechanism realizes the user search query generated from the original title corresponding to the product, and the encoding parameters are shared between the two tasks, so that the Attention distribution of the two subtasks on the original title is as consistent as possible. The distribution is jointly optimized, and finally the short title generated by the main task is more inclined to reveal keywords that can promote transaction conversion while retaining the core information in the original product title.

Offline manual evaluation and online experiments prove that the short title generated by this method exceeds the traditional abstract method in terms of readability, information integrity, and core product word accuracy without affecting the transaction conversion rate.

How to solve the problem of lengthy product titles

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse