Evolution of e-commerce search algorithm technology

1. Some features of search

There are billions of commodities, linked to thousands of leaf categories, hundreds of first-level categories, and more than a dozen industries. How to enable users to find products that meet their intentions is the primary problem that needs to be solved in search.

In terms of large structure or process, search has many similarities with traditional search engines. Including sorting, analyzing and indexing data to create an index library, how to search in the index inverted list according to the keywords input by the user, complete the evaluation of the correlation between the product and the search, sort the results to be output, and realize Some kind of user relevance feedback mechanism etc.

Of course, as an e-commerce product search, its natural commercial attributes bring more of its own unique technical characteristics.

From the perspective of data update, data changes and updates are very fast. A large amount of new product data is uploaded to the website every day. Once a new product is uploaded, this product needs to be searched. Unlike web search, anyone can publish a new web page, but whether it is indexed by search engines is another matter. At the same time, a large number of products are constantly being updated every day, including changes in product title descriptions, changes in product prices, updates in product images, product removals, etc. These changes also need to be updated in real time in the search, so that Allow users to find updated product information in a timely manner. In the whole network search, many web pages are static and the relationship between web pages also changes slowly. The update of a large number of indexes does not have the real-time requirements like search.

From the perspective of search data sources, pictures of products play an important role in the user research and purchase process, and a large part of the search results are occupied by pictures. How to use image information more effectively, whether it is based on image retrieval, or considering the quality of images, the relationship between images and text, etc. are all things that need to be considered and processed by search.

Another feature is the full link feature. Search, comparison, and purchase all happen on the site, unlike general search engines on the entire network, users jump to other sites after searching, and it is difficult to obtain user behavior data before and after searching. In search, after searching, the user will click on some of the products, then compare these products, communicate with the seller, and then place an order to buy, or return to continue searching. The data and information before, during and after the search are very rich. , user behavior data with full links can help us design a better search ranking algorithm.

The last and more important point is an ecosystem. The design of the search ranking algorithm not only reflects the technical pursuit of the search itself, but also contains more commercial demands. In the whole network search, whether a general web page is indexed or not, and whether it can be displayed after being indexed, is not a decision point related to life for the owner of the web page. On the Internet, it is completely different. Many businesses rely on it to solve people's livelihood and employment problems. The traffic and transactions of online stores are related to the lives of many people. In the design of the search algorithm, it is necessary to consider not only the user's search experience, but also business rules to ensure fairness and traffic decentralization. Many search algorithm principles, rules or algorithm results will be advertised to sellers to guide sellers to develop in a better direction.

2. Evolution of search algorithm technology

As the interaction between massive consumers and the platform, and the main carrier of commercial activities carried out by a large number of merchants on the platform, search is the best scenario for the intelligent application of big data; during the years of development of search algorithms, relying on the engineering architecture system Gradually improve, gradually realize from the era of simple manual operation and simple algorithm rules, to form a complete offline online and real-time deep learning and intelligent decision-making system, and become the intelligent center of Alibaba e-commerce platform traffic distribution and business driving. Summarize the search algorithm The iterative progress of technology can be roughly divided into the following four stages:

2.1 Retrieval era

This stage corresponds to the business, and the search sorting mainly revolves around rules and carousels. At this stage, the amount of data and users is still at a controllable level. Professional operations and products with domain knowledge often act as the makers of information display rules, formulating the product display logic behind the query words based on subjective judgments and market acumen. Of course, the search at this stage will also use some basic algorithmic logic to ensure the correctness of information matching and the fairness of matching people and goods. Based on the correlation model of traditional search engine technology, it can ensure the effective matching of user query terms and product titles; Whether or not the sales popularity model ensures that products accepted by consumers get more display opportunities.

But in general, it is still based on artificial rules to combine various related factors to get the final ranking. The advantage of "artificial rules" is that they are easy to understand and manipulate, but the disadvantages are self-evident. As the scale of the platform increases, simple rules cannot accurately express the efficiency of matching people and goods, and are easy to be used by some unscrupulous merchants to disrupt the market. order;

2.2 The era of large-scale machine learning

With the expansion of the platform scale, large-scale merchants settle in, actively manage stores on the platform, release products, relatively structured product organization system, category structure, attribute information, accumulation of sales based on products as keys, and comments Accumulated, these have accumulated important raw data for a better understanding of products; consumers interact more and more frequently with the platform through searching product pages at all levels; the organization of data forms a structural system with human as the key, and feedback signals also It can be effectively circulated in a closed-loop system; all of these have accumulated important data for understanding users.

The accumulation of effective data provides the necessary soil for large-scale application of machine learning techniques to solve problems. Search has also entered the research and development of various large-scale models, such as click prediction models, etc., to study issues related to large-scale data features, high feature complexity, high data timeliness, and frequent model training. Processing capacity to analyze and mine ultra-large-scale data in the order of millions or even billions. At this time, there are more and more factors involved in sorting. From the beginning, category correlation and text correlation, product popularity points; later, in order to balance seller traffic, add seller points; and later, for better user experience, Added factors such as personalized person and product click prediction, image quality, etc., and started to use a learning to rank (LTR) method to construct learning samples based on product clicks and transaction data, and learn to regress to sort weights.

2.3 The era of large-scale real-time online learning

First of all, compared with general search, e-commerce search has higher requirements for real-time computing/learning, and users will have richer multi-dimensional interactions and longer browsing links. Therefore, if the user's behavior in the early stage of the link can be captured by the system in real time and modeled into the engine and act on the later stage of the link, it will be extremely critical to the efficiency of the entire link and user experience.

Secondly, the distribution of user behavior is not static, which breaks the independent and identical distribution assumption that the supervised learning algorithm for offline training relies on. Especially in a big promotion scenario such as Double 11, the traffic in one day is equivalent to that in daily days, and the distribution changes. will be more intense.

Finally, due to the limited search and display of products, the product collection that can enter the log system to get user feedback only accounts for a part of the complete collection of products, so there will be inconsistencies between offline training and online training. This inconsistency can be alleviated to a certain extent through the online learning system .

Therefore, we built a real-time computing and online learning system that supports real-time analysis and processing of massive user behaviors and associated massive commodities within seconds, extracting multi-dimensional user/commodity data features, and adopting distributed Parameter Server architecture for online learning, so that user behavior can affect online services such as search ranking within seconds.

We have gradually implemented "real-time features" -> "real-time ranking factor model" -> "real-time top-level LTR/Bandit model", and completed the real-time construction of the trinity. We have successively implemented a series of microscopic features such as pointwise-based FTRL updated in seconds, real-time pairwise matrix decomposition model and real-time bilinear model, and based on this, real-time Learning to rank and real-time Multi-Armed Macro-control models such as Bandit have realized the upgrade of the dual-link real-time system.

At the same time, the online learning system strongly supports the precise regulation of traffic, enabling business decisions to be made more quickly and effectively. In addition, we also abstracted the algorithm part of online learning, and established a general one-stop online machine learning algorithm platform AOP (Algorithm One-stop Platform), which makes the establishment and deployment of online learning models more convenient and efficient, and has Highly scalable. So far, the online learning system has become one of the basic components of the search architecture, and has played a huge role in improving user search experience, supporting business decisions, and supporting the effects of major promotions such as Double 11.

2.4 The era of deep learning and intelligent decision-making

Artificial intelligence represented by deep learning and reinforcement learning has brought brand-new changes to search technology, especially in the three directions of semantic search, search personalization and intelligent decision-making.

In the field of semantic search, we designed and implemented Query's representation learning framework, and through multi-task learning and collaborative training technologies, we provide a unified representation vector for a series of applications such as Query's labeling, category prediction, rewriting, and recommendation. At the same time, we also implemented a product representation learning framework to provide a unified product representation for product content understanding, product intelligent creativity, product semantic recall and semantic matching. Based on the representation framework of query and product, we implemented semantic recall and semantic similarity models, thus completing the qualitative mutation from literal matching to semantic matching. In addition, in addition to increasing the relevance of search results and improving user experience, semantic search can also curb the problem of product titles being piled up with popular keywords to a certain extent.

In the field of search personalization, we have upgraded the original personalization system through a number of technologies: through the deep user perception model of multi-task learning, we can learn the general expressions of users from massive user behavior logs, which can be used for user behavior Tasks such as recognition, preference estimation, personalized recall, and personalized ranking; through multi-modal fusion learning, we can automatically integrate multi-dimensional features such as text, images, labels, brands, categories, stores, and statistical features of products Together to form a unified product representation; through online deep ranking learning, we have integrated user status to achieve a more accurate sorting model for thousands of people; through the vector recall engine, we have obtained recall results with better generalization and effectively improved Keyword and personalized matching depth; through deep transfer learning, we have widely applied search personalization technology in multiple scenarios other than search. With the widespread use of these deep models in the field of personalization, the accuracy of personalization systems has been significantly improved.

In the field of intelligent decision-making, we model the user's decision-making sequence according to the interaction between the user and the engine during the search process, and propose a search session Markov decision-making process model, which guides reinforcement learning to search ranking. At the same time, in order to solve the problem of convergence of search results and waste of exposure in different scenarios, we propose a multi-agent collaborative learning method to realize environment perception, scene communication, individual decision-making and joint learning among multiple heterogeneous scenes, and realize the maximization of joint revenue. , rather than ebb and flow.

After the hard work and accumulation of the four historical stages of "retrieval era->large-scale machine learning era->large-scale real-time online learning era->deep learning and intelligent decision-making era", we have gradually formed today's search algorithm ranking system.

3. Future Development: Exploration of Cognitive Intelligence

As shown above, after years of development, search and recommendation, as the two largest natural traffic entrances of Ali e-commerce, have been All in AI, and have formed a complete online learning of user preferences, fine matching of traffic, and reinforcement learning-based A shopping decision-making system with intelligent decision-making capabilities.

But in this process, most of the knowledge learned by searching, sorting or recommending is obtained through the existing product label data and user behavior data. There is still a lack of deeper understanding of products and users, and it is still impossible to fully understand users. The real needs expressed by multiple intentions. For example, if a user searches for "sexy dress", they may want to find "low-cut evening dress for an evening party" or "off-the-shoulder beach dress for a beach vacation"; the user has collected "climbing shoes" and "Crutches" may have a demand for "mountain climbing equipment" and need to find more products related to mountaineering equipment; a father with a child, at the beginning of summer vacation, chooses "conversion joints" and checks "British Museum Tickets" ", maybe I want to take my family on a "summer UK parent-child trip" and need to find more related products in other categories.

The reason is that the current artificial intelligence technology, especially the model represented by deep learning, is developing rapidly in real-world applications, mainly benefiting from massive big data and large-scale computing power. Through digital abstraction and stylized learning of the physical world, It makes artificial intelligence have a strong ability to obtain limited knowledge, but it is difficult to obtain knowledge other than data, let alone knowledge analogy, transfer and reasoning. The cognitive intelligence of machines, such as autonomous learning and discovery, and even creativity is the higher level of artificial intelligence. Of course, general artificial intelligence still has a lot of work to do, but in this process, how to combine human knowledge and machine intelligence to achieve preliminary cognitive intelligence, so that search and recommendation have an intelligent experience is what we are currently exploring direction.

To achieve cognitive intelligence, we first need to have a deeper understanding of users, products, sellers, etc., and systematically establish a cognitive knowledge system in the field of e-commerce. The figure below shows the three-dimensional cognitive map of e-commerce-goods-market we defined, which consists of four parts, including users, scenarios, categories and products. These different types of concepts construct a heterogeneous graph to realize the association of users-scenes-products and the deep cognition of data in various dimensions.

The scene is a semantic description of commodity relations, a conceptual representation of user needs, and a bridge connecting users and commodities. From the perspective of commodities, a scene can be understood as a description of commodity relationships with semantic interpretation. For example, commodities that belong to the scene of "giving gifts in the Mid-Autumn Festival" have the attributes of being gifts on the Mid-Autumn Festival. From the perspective of the user side, the scene can be regarded as a conceptual description of user needs, such as "outdoor barbecue", "holiday wear", etc. Therefore, we can also say that the scene is a bridge connecting users and products. These scene relationships can be obtained through behavioral data mining, or input from industry or expert knowledge. Scenes, categories and commodities eventually form a unified scene graph.

With such a cognitive map system, and then through reasoning and calculation to identify the user's real scene appeal, the cognitive intelligence of search and recommendation can be gradually realized. This involves another important part of the cognitive intelligence system: the online graph calculation and reasoning engine based on the cognitive map. With the calculation and reasoning engine, it can be realized: when the user needs have been expressed in behavior, recognize the user demand scene, dig out and meet the user's deeper needs; Information to expand and stimulate user needs; at the same time, based on online data and user feedback, optimize the scene mining and construction of cognitive maps, continuously modify and discover scenes, and improve reasoning capabilities.

Behind the cognitive map and online graph computing and reasoning engine, on the one hand, is the in-depth application of a series of technologies that we have accumulated and accumulated, including some traditional technologies such as knowledge representation storage and reasoning, information retrieval, and natural language processing; On the one hand, the cognitive map can be deeply integrated with deep learning, reinforcement learning and other technologies that have made breakthroughs in recent years, such as the vectorized representation of entities and relationships (embedding), making the retrieval of entities and reasoning of relationships from discrete to continuous; Cognitive map is integrated with the existing deep supervision network as an optimization constraint, and the domain knowledge is applied to the model more smoothly, instead of simple rules taking effect; the sequential decision-making process modeling is introduced in the knowledge reasoning process, and reinforcement learning is used to reduce Search spaces to speed up the inference process and more.

With the cognitive map and online reasoning engine, all kinds of delicate chemical reactions will occur in various fields such as global search recommendation shopping guide, intelligent interaction and content generation, and in the process of cognitive application, according to the user's cognition Feedback of reasoning results, the system continuously iterates and optimizes cognitive maps and reasoning algorithms, thereby improving cognitive computing capabilities. Gradually, we can build a comprehensive e-commerce cognitive intelligence system with self-learning ability, reasoning ability and verification ability.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us