By Liu Zhijia, Alibaba Cloud Intelligent Product Manager
Search has always been one of the core portals of e-commerce industry traffic sources. Developing different ways to build an e-commerce industry search service and improve the search effect has always been a difficult problem for e-commerce industry developers to overcome. Although basic search services can be built based on traditional databases or open-source engines, with the increase of commodity data and the growth of business traffic, you will inevitably encounter performance bottlenecks and effect bottlenecks. On the other hand, with the continuous development of e-commerce, livestreaming, cloud computing, and other technologies, more traditional retail enterprises are undergoing the cloud transformation. Over the past two years, enterprises have been affected by the pandemic and other factors. During this time, apps and mini programs have become an important source of business growth for retail enterprises. In this context, developing different ways to build an efficient search service has become a difficult problem for the retail industry to migrate to the cloud and transform.
Alibaba Cloud Computing Platform Division launched a search solution based on MaxCompute and Open Search in the e-commerce and retail industries to solve these two problems. The solution can implement a search and development platform for commodity storage, database building, search, and tuning.
This article describes how to build an e-commerce industry search service quickly and efficiently based on MaxCompute and Open Search from four aspects: product introduction, e-commerce industry features, industry search development practices, and more solutions.
Alibaba Cloud MaxCompute is a simple, easy-to-use, and fully managed analytics-oriented enterprise-class Saas-mode cloud data warehouse. It is simple and easy to use and can match business development for flexible and elastic expansion. For cloud developers, MaxCompute supports a variety of business analysis scenarios, such as Machine Learning Platform for AI, data lakes, traditional data warehouses, and near real-time data warehouses, and provides a more open development ecosystem.
MaxCompute uses Serverless architecture to provide fast, fully managed online data warehouse services to achieve enterprises' goals of minimizing costs while meeting differentiated requirements. This eliminates the limitations of traditional data platforms in terms of resource extensibility and elasticity, periodic fluctuation scheduling, key task assurance, and stable and predictable requirements. It meets users' business agility and minimizes user O&M investment. This enables users to analyze and process massive amounts of data economically and efficiently. These features make MaxCompute ideal for application scenarios in the e-commerce and retail industries to meet the computing and storage requirements of industry developers.
MaxCompute provides Serverless data access services, multi-computing environments, storage services, and Resource Management, reducing user O&M costs significantly and allowing users to focus more on their business expansion and development.
In terms of product ecosystem, MaxCompute provides a wide range of open ecosystems, such as the product's open ecosystem, Alibaba Cloud product solution ecosystem, data application ecosystem, and open-source engine tool integration. Based on MaxCompute, developers can choose business development methods freely and customize personalized product solutions more flexibly.
MaxCompute's offline, real-time, analysis, and service integrated data warehouse is especially suitable for enterprise real-time data warehouse scenarios, BI report interactive query scenarios, and user portrait analysis scenarios. These scenarios are an indispensable part of commodity data storage, user behavior guidance, and analysis in the e-commerce industry.
Within the Alibaba Group, MaxCompute, as a best practice for instant query scenarios during Double 11, can support hundreds of millions of TPS write speeds, and petabytes of data are queried in sub-seconds. This meets the high timeliness requirements of the e-commerce industry in large-scale scenarios. Based on these features, MaxCompute has become the preferred storage and computing service for cloud developers in the e-commerce industry.
As mentioned earlier, MaxCompute supports a variety of open ecosystems, such as open-source ecosystem integration and mainstream commercial software integration. At the same time, MaxCompute can form a one-stop solution with other Alibaba Cloud products to build big data applications, such as search and recommendation, commonly used by e-commerce. MaxCompute can link with Open Search to form a one-stop search and development platform, especially for the search business in the e-commerce and retail industries.
Alibaba Cloud Open Search is a search Business Mid-End of Alibaba Group and an intelligent search cloud service product based on the big data deep learning online service system. Open Search provides services for more than 500 businesses within Alibaba Group, such as Tmall, Hema, and Cainiao, which support ten billion search visits daily. During the Double 11 period, it supported the search services of various products within the Alibaba Group, including a single service search QPS peak of over one million. Open Search has been commercially exported on Alibaba Cloud since 2014 and has provided search services to thousands of customers, hundreds of e-commerce, and retail enterprises.
Open Search products provide core engines, recall sorting, search guidance, and other services and capabilities in all aspects of the search before, during, and after the search to achieve one-stop search business development. For experienced search developers, Open Search provides open services in application structure, recall, sorting, algorithm, and other links to meet the personalized customization needs of developers. Open Search helps novices and operators by providing industry templates for e-commerce, education, and other industries. It can build search services with better results in one click to help enterprises achieve their business objectives.
Open Search provides multi-scenario search methods and solutions, such as product, order, store search, database acceleration, and analysis, especially for the e-commerce industry.
The e-commerce industry is highly transaction-oriented and GMV-oriented, with the ultimate goal of guiding more and higher purchase transactions to achieve a win-win situation for e-commerce platforms, buyers, and sellers. Currently, search and recommendation are the most important traffic portals in the e-commerce industry. Like the three apps in the figure, the search portal is placed at the core of the entire app, which is convenient for users to find the search portal in the first place. The following are other sub-applications or commodity classification screening, and the following is the recommended feed stream. The data shows that more than 90% of GMV contributions come from search and recommended traffic guidance.
When the user already has a clear purchase demand to open the e-commerce app, he has a high probability of searching for the target product, and in this scenario, the guided purchase rate and conversion rate are very high. Therefore, the search effect is very important for the e-commerce industry.
How do you measure the effectiveness of the search? Based on years of search experience in the e-commerce industry, we mainly divide the core indicators of an e-commerce search into effect indicators and performance indicators. The effect indicators include click-through rate and no-result rate, and the performance indicators include search response time and data synchronization response time. In simple terms, it enables end users to find target products faster and more accurately.
The search query in the e-commerce industry is also different from that in other industries. When searching, users in the e-commerce industry will pile up keywords habitually. For example, when searching for a query that does not find a specified product, they will continue to enter supplementary explanation query to filter the search results. This also leads to the word order of the e-commerce industry query not having much influence on the search compared to other industries. Moreover, many general e-commerce apps will contain commodity information from all walks of life. When the same word appears in different contexts, it will represent different information.
Based on these special search query features of the e-commerce industry, when users create self-built searches through databases or open-source engines, they often encounter problems, such as fewer query recalls, poor document relevance, and unsatisfactory sorting results, which affect the search effect and user purchase conversion.
In terms of user intent recognition, when different users enter the same vocabulary in different scenarios, many products in various fields may be covered. For example, when a user enters the word Apple, the user may be referring to mobile phones, fruits, tablets, headphones, notebooks, and other categories. This was a bad case frequently encountered in the early stage of self-built e-commerce search through open source solutions. So, how can we solve these problems and bad cases, optimize the search effect of the e-commerce industry, and improve the search guide GMV?
E-commerce search services involve multiple dimensions, such as commodity data, search queries, and user behavior, and multiple links, such as before, during, and after search. When we connect with different enterprises, we often encounter various questions raised by customers. Students without much search experience beforehand may ask, how can we build a database of commodities? How can we understand the user's query intent accurately? Experienced developers may ask, how do you provide users with a personalized search experience? How can we ensure performance in high concurrency scenarios?
MaxCompute and Open Search proposed a corresponding industry search solution together to help developers in the e-commerce and retail industries solve the preceding problems faster and better.
On the whole, the commodity data and behavior data stored in MaxCompute are transmitted to the Open Search through automatic database synchronization or API/SDK synchronization. Then, the query and analysis, sorting, search guidance, intervention, and extension functions are customized in the Open Search. Ultimately, a high-performance, high-real-time, highly reliable, fully managed, and O&M-free e-commerce industry search solution with better search results is realized.
According to the search behavior of users, this solution can be divided into five key links: building search applications, user input query terms, user intent recognition, accessing search engines, and returning search results, which correspond to the development of five modules: MaxCompute database creation, search guidance, query analysis, search engines, and sorting services.
In the product database building stage, users store their product data and user behavior data in MaxCompute. Open Search provides e-commerce industry templates to facilitate the use of e-commerce industry developers. Users can create search application structures in one click to build databases quickly. Next, define the field type, meaning, and association between multiple tables in each table based on the fields in MaxCompute or the custom application structure in the Open Search. Then, according to the search requirements of different business scenarios, different fields are combined into target indexes and searched in the corresponding indexes. For example, in the e-commerce industry, product name, store name, product category, etc. are common search fields. You can build these fields into an index. After users enter the query, they will search for information related to products and stores in these fields. After the index structure is built, the search service will start building for the user. When the status of the application is available, the basic version of the search service will be built.
Before users enter search queries, the e-commerce industry often provides preset search queries. This process is called search guidance. Currently, common pre-search guide modules include hot search and shading. Hot search provides some popular search terms according to recent hot events and user search behaviors and lets users can click on the search terms directly. Shading means there is a preset query in the search box before the user enters their search term, and the user can click search directly to search for the corresponding search term. Hot search and shading are an important part of the search link. On the one hand, hot search and shading can guide users' search behavior and reduce the difficulty of tuning in subsequent links. On the other hand, they can also be based on different operating goals at different times to achieve the goal of improving search and guiding purchase. Open Search supports automatic training of hot search and shading models and realizes manual intervention of timing and positioning through black and white lists to achieve the effect of manual operation and guidance.
Another commonly used search guide is a drop-down prompt. In the process of user input query, automatic association of other candidate queries reduces user input costs to achieve traffic guidance effect. Currently, Open Search supports a variety of drop-down prompt model construction methods, drop-down prompt extension functions (such as high-frequency search terms), historical search terms, intelligent sorting, and manual intervention.
The user's search experience can be improved through the search guidance of hot search, shading, and drop-down prompts, and manual operation can be realized to attract purchase conversion.
A search request is enabled after the user is guided by search or has entered a query manually.
First of all, we need to understand the search intentions of users. As we mentioned earlier, some users in the e-commerce industry have some colloquial expressions or tend to pile keywords when entering search queries. Therefore, we need to transform the queries that users describe from the perspective of purchasing requirements into structured, relatively clear, and standardized expressions. This is the user intent recognition process.
Our common user intent recognition includes synonym extension, stop words omission, error correction and rewriting, entity tag recognition, and category prediction.
Next, let's introduce the user intention recognition link in detail through an example.
For example, the user entered a query called NIKE blue sneaker high top. First, we will normalize punctuation marks or cases. In the first step, it will become nike's blue sneaker high top. Then, we will divide nike's blue sneaker high top input query through e-commerce industry word segmentation. Next, I entered the stop word link. For example, 's in the setting is a meaningless word, so it becomes nike blue sneakers. Next, there is a spelling correction element, which will correct the typos and turn it into nike basketball shoes. Next, I will use a category often used in the industry called industry entity identification to analyze the meaning of the previous words. The changes are nike: brand, basketball shoes: category, high-top: style. In addition, development search also supports category prediction. Through the results above, the current query will be given a weight, such as nike-high, basketball shoes-medium, and high-high-medium, and another search term extension, such as nike or Nike sneaker high top. Finally, a query that the engine can understand after layer-by-layer rewriting is output and input into the search engine.
After the query change is completed, the search engine recall phase is entered. Open Search provides a variety of recall strategies, including text recall, personalized recall, and vector recall. Text recall is the most common recall strategy in the search field. It compares the text correlation between the query and the product data after the change and uses inverted indexes to implement the recall. Open Search uses the Qitian 3 text search engine developed by Alibaba Group, which can process search tasks in high-concurrency and multi-write scenarios with high performance and return search results faster. Personalized recall will introduce personalized information of users based on query word rewriting and return personalized search results for users. Vector recall introduces vector information based on rewritten words and returns search results based on the vector similarity between the query words and commodity data. Traditional text search may miss some search results that do not seem relevant but are actually user target requirements, and vector recall can solve this problem. Using text recall and vector recall to conduct a multi-channel search at the same time can reduce the search results rate of no results and optimize the search effect significantly.
After completing the recall phase, we have already obtained some commodity data related to users' search needs. Next, we need to sort the recalled commodity data and feed it back to users in the most reasonable order to ensure that the search results that users are most likely to click are ranked first, thus improving search guide conversion and GMV. Open Search provides a two-round sorting mechanism for rough sorting and fine sorting and supports multiple sorting methods, such as sorting expressions, custom plug-ins, and algorithm models. The internal sorting process is fully open to developers, enabling developers to customize exclusive sorting strategies based on their business requirements.
In the custom plug-in environment, the Open Search provides the Cava compilation language and its plug-ins. Cava is a compiled language developed by Alibaba. Its syntax is similar to Java, its performance is equivalent to C++, and it supports object-oriented programming. An IDE that supports Cava compilation has been integrated into the Open Search console. Users can compile and customize Cava plug-ins on the console directly to make debugging and modification more convenient.
To sum up, users use MaxCompute and Open Search to realize the e-commerce and retail industry search development of commodity database building, search guidance, user intent recognition, search engine recall, and result sorting, with better performance and fully customized search services.
First of all, word segmentation is the most basic and indispensable part of Chinese search. Open Search integrates the e-commerce tokenizer of the same model as Taobao Search. The training corpus comes from millions of labeled e-commerce industry data accumulated by Taobao Search over the years. We compared the effect of the Open Search common e-commerce tokenizer with the open-source IK tokenizer. Among the 100 actual queries for the e-commerce searches we used, 63 queries had better word segmentation results than the open-source IK tokenizer. The ratio of good and bad exceeds 4:1.
Based on the e-commerce general word splitter, we cooperated with the DAMO Academy Natural Language Processing Team to carry out the special optimization of the e-commerce industry template and proposed the e-commerce enhanced analyzer and the corresponding query analysis algorithm. Specifically, the accuracy of e-commerce word segmentation F1 was increased to 95%, the accuracy of entity recognition F1 was increased to 80%, and the spelling error correction FAR was reduced by 1.4%. At the same time, more than 100,000 e-commerce synonyms were added. These effects are at the leading level in the NLP e-commerce field.
The following is a comparison between some universal version analyzers and enhanced analyzers in the e-commerce industry. We also support algorithm-specific customization services for e-commerce and retail customers in different fields and different categories. It provides user-level customized query analysis, CTR estimation, vector model, personalized model, etc., to improve the search effect in all aspects.
We provide a one-click configuration capability for e-commerce users, especially for users in the retail industry that have just started the cloud transformation. Users only need to check the search-related functions they want to realize on the console, such as recall, query analysis, sorting, and peripheral services. They can generate the corresponding application structure, index structure, and each specific function strategy automatically to realize an all-around one-click configuration of e-commerce search.
The following section briefly introduces two typical customer cases of e-commerce and retail industry search. An e-commerce shopping platform app provides users with functions, such as commodity search and a coupon shopping guide. Customers initially chose to develop their own search but encountered some bottlenecks soon after. For example, under the index volume of hundreds of millions of commodities, complex search and filtering requirements often affect search performance. The peak traffic will increase significantly, especially during the e-commerce promotion periods. After investigating various products and solutions, users finally chose MaxCompute + Open Search solutions. MaxCompute's flexible O&M mechanism is highly applicable to e-commerce industry scenarios, and Open Search can provide performance and effect guarantees for search services. After continuous use, we have received good feedback from customers, especially regarding the stability guarantee in engineering and operation and maintenance. It enables users to concentrate on studying business and algorithms and promote product revenue and development.
Another user is a retail industry user that has just recently accessed this service. This is a supermarket retail brand used in more than 10,000 stores worldwide. Under the background of the rapid development of the new domestic retail market, online business is particularly important if you want to deploy and enhance brand influence quickly. At first, the user also chose the self-developed search scheme and applied it to the online mall, but the effect was far from the expectation, and the user's shopping experience was not good. Recently, users have accessed the Open Search e-commerce industry template, which improves the search effect significantly by using the built-in multi-channel recall, personalized sorting, and other functions. After half a month of access, the overall purchase conversion rate increased by 10%, and the no-result rate decreased significantly from 29% to 7.5%. Users also specifically mentioned the cloud fully managed service model of MaxCompute + Open Search, which reduces personnel investment and O&M costs significantly. The overall cost performance of users is extremely high.
In the e-commerce industry, in addition to commodity search scenarios, there are a variety of simple conditional search scenarios, such as order search, favorite search, and category search. In these scenarios, MaxCompute + Open Search can provide database retrieval acceleration services to ensure high-performance and real-time search.
The Open Search vector recall capability can search user submitted images to help find the goods you want. It has become another typical application scenario and search method.
As a result, it can realize the e-commerce application guarantee of search, recommendation, and advertising in the e-commerce industry combined with other cloud products, such as Alibaba Cloud Artificial Intelligence Recommendation (AIRec).
In the other direction, Open Search is currently in the process of revealing engine capabilities and the built-in core engine to the cloud for more developers to use. It is expected to launch officially at the end of September. It will provide a more open ecology and all-around user customization capabilities.
MaxCompute-Based Artificial Intelligence Recommendation Solution
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - October 18, 2021
Alibaba Clouder - September 2, 2019
pangdaxing - December 24, 2019
Alibaba Clouder - July 6, 2018
Alibaba Cloud MaxCompute - March 25, 2021
Alibaba Clouder - March 10, 2021
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreMore Posts by Alibaba Cloud MaxCompute