AliExpress (AE) is an e-commerce platform that mainly focuses on cross-border users. It has been a time-consuming and laborious pain point for AliExpress to obtain the consumption intentions and consumption trends of local users. This is further corroborated by the challenges of data privacy and security policies for overseas users. In addition to the direct data purchase behaviors on AliExpress, e-commerce queries extracted from leading search engines like Google are closer to the needs of local users. These queries directly reflect the unbiased user mind in a country and directly analyze user words to present products on the platform.
In 2021, our supply-side data science team is aiming to create more opportunities globally on the basis of data intelligence, both internally and externally. We have developed a comprehensive solution to keyword opportunities in market insight on the basis of queries inside and outside the site and by combining competitive data from Amazon and Cdiscount.
This time, through deep cooperation with the Spanish business/technical team, we have performed multiple trial runs of the solution and fully discussed and solved the potential detail problems. We have built our keyword data capability from scratch and have introduced the module of keyword opportunity products in the data bank of the platform. This product module aims to guide us to answer how to provide the "right product" at the "right time" in the "right way" from the perspective of localization. It is our first attempt to jointly drive business by data and products on the basis of insight into the micro-data of the market.
The building of data capabilities are divided into four modules: word source ability building, word processing capability (word normalization, named entity recognition, and trend predication), algorithm for word selection of opportunity products, and word diagnosis. The difficulty of keyword capacity building does not lie in a specific module but in the design of the overall data solution framework. The main problem is to get more comprehensive word interpretations by obtaining and integrating other external data sources according to the information limitation based on keywords. Thus, products that meet users' immediate needs can be found. In addition, we need to take into consideration the actual supply situation in AE and create product sets that can be flexibly used in different scenarios.
To explore the real needs of users, we have designed word solutions with more external data sources. The highlights of the overall process include:
Based on Ali Translation and ALiNLP, it conducts word normalization and named-entity recognition for queries inside and outside the site. Thus, the ability of operation personnel to identify effective words is improved.
It introduces Google Trend and UV inside the site to identify the trends in the off-season and peak season of the search words.
It introduces search results from competing websites Amazon and Cdiscount and matches them with products with the same style inside the site. It quickly monitors and operates the information of products outside the site.
It makes product selection specified and efficient by combining AE and matching products with the same style.
It gets through the downstream delivery process of search recommendation (SS) to directly assist operation personnel in sales tests and product posting.
At present, there are two kinds of data sources inside and outside the site: top search words in AE (daily UV > 50) and Google top search keywords of e-commerce (Google updates top search keywords of e-commerce in each country every week). In addition, AE has access to word sources from Ahref, a third-party keyword search tool, and puts words in its project of SEO local words for external traffic acquisition.
Note: After the words are accessed, the results of category prediction will be marked. Currently, through business sample inspections, the category relevance accuracy of words can reach 87.6%.
Objective: To clean synonyms from multiple resources inside and outside the site. Stemming basic logic: Source words (case requirement/spaces/special symbols removed) > translate words into English (Ali) > perform tokenization on English words through algorithm > aggregate and sort out words according to the result (sorting logic: UV_cnts within one day, indicator: word_rank) > take the source word with the highest ranking as the stemming word in the source language.
Method: Clustering the same stem results based on STEM and Ranking to obtain the TOP 1 source language (SEO application has no ranking abundancy requirement). For example, the following table is obtained from the Google-ES and normalization of words in the site. Take "man shoe" as an example. The following table will normalize multiple similar queries to "man shoe" and finally obtain the query "men shoes, uv_1d:2285875".
Objective: It aims at parsing the semantics of queries to provide more information to operation personnel. Thus, different applications can be developed for different scenarios.
Method: It recognizes different word attributes based on NER and establishes a classification model to further mark words in combination with the operation and application scenarios. Currently, the words and classifications that can be recognized are as follows. After industry evaluation and the optimization of word attribute extraction, the evaluation accuracy has increased from 85.7% to 90%
Due to the special nature of language, we may have some errors in marking e-commerce words. Therefore, we use a semi-artificial and semi-algorithmic recognition mode. We have set operation blacklist mechanism on products. Based on business feedback words and category-related bad cases, some keywords can be removed from products, and the model can be optimized.
Objective: It aims at predicting the peak weeks of words and arranging the operation in advance
Method: It recognizes the peak season and off-season of words on the basis of the past data sources inside and outside the site. Based on the research on the data of Google, we add the similar public data source Google Trend to supplement it. As for the word trend prediction, we compare the week-on-week and year-on-year data of Google Trend and the data of Google BI from the same time. This can enable multiple external data sources to integrate and enrich our information. In addition, this can exclude the influence of word trend prediction during promotions, making the prediction more accurate.
Method: The current training materials for product selection through words come from two data sources: research words from sources on the site and research words in competing websites (data sources from Amazon and Cdiscount). Crawling search words in competing websites and matching products with the same style are important innovations of the ability of opportunity product selection through words. Based on our understanding of AE business and competing websites, we take into account the advantages of competing websites. They have a greater volume of product transactions, more comprehensive user distribution, and more localized merchants. Thus, we believe:
Therefore, we build an algorithm process:
In the process:
Exploration model: It recommends related products through the Swing algorithm on the basis of hot products inside and outside the site.
Prediction model: The goal of the business is to cultivate a west-oriented product pool and drive opportunity products through good products so that local users can gradually form a mind of shopping. Therefore, we take product dpvCTR as the optimization goal. On this basis, we use the CTR prediction models, including LR, LightGBM, and DeepFM.
Refined selection of products with the same style: Select better products in the recalled products according to DSR, price, and logistics, respectively.
Objective: It aims at diagnosing, classifying, and marking keyword opportunities in advance on the basis of insight into BA business and business operation experience. As a result, we can efficiently screen opportunity queries that are truly valuable.
Method: In addition to common indicators in the site, we mainly give the data evaluation indicators outside the site, such as the UV index of searches outside the site, transaction indicators on competing websites (number of comments * price modeling). These indicators offer guidance on the popularity of business outside the site and the transaction situation outside the site.
Indicator criteria are divided into the following categories:
The indicator criteria vary according to different business needs. Each potential business party can process its own criteria by offering them to the data team and then apply them in the opportunity center products.
So far, this capability has been successfully applied in the POC scenarios of the product campaign in Spain, and its value has been verified. This capability can be quickly reused in the supply of trend goods in France, Russia, and the United Kingdom. The scenario value verified in Spain includes:
We have cooperated with the SEO team to build a local word library and have crawled the third-party keyword search tool Ahref. We have also conducted an application experiment in SEO. The user needs are traffic both inside and outside the site, and they are free of charge. Based on the data used over the past week, the performance of Ahref is as follows:
In March, we worked with the ranking business and operation personnel to build the ranking capability in the site, and we mainly optimized the word-based product selection solution in two aspects:
The process was launched on March 28, with 94 keywords evaluated and 200 rankings established. The meaning of POC is to complete the automatic ranking building mechanism and to improve the efficiency of operating personnel in discovering local opportunity words, product matching, and product posting. For example, the following tag bank accumulates keyword trend tags.
Even if we have obtained relevant data sources from the outside and know what the trend is, there may be no good tools inside the site to reflect the trend immediately. Human resource waste will greatly increase in repeatedly searching for BI prior products or fishing product titles from keywords to conduct prismatic application. There will also be the problem of inaccurate recalls of products. After several times POC, we think we need a platform that can give detailed keyword information and provide quick delivery of accurate word-based opportunity product selection capability. It is an indispensable tool for AE to capture trends outside the site.
Interface of opportunity word exploring: Based on the search ranking of keywords, historical trends, and conversion of L-D inside the site, operation personnel determines whether to create a product set for the application. The opportunity type is a BI definition indicator, which can help operation personnel quickly find effective keywords.
Interface of product set creation based on opportunity words: Products are viewed to form product sets on the basis of business decisions. The product sets have been applied to the search recommendation scenarios. They are implemented to build different premium product pools in different countries and are also used for the personalized album release (this word process module was launched at the opportunity center at the end of this month).
The products can automatically capture the following requirements:
1) Periodic hot spots
2) Soaring hot spots in the site
3) Soaring hot spots outside the site
Recent application effects: the keyword click rate of ES is 30%-50% higher than that of the overall market.
At present, we are further improving our capability of word solutions from two directions: capability building and scenario extension. In order to build more accurate product selection based on words in the future, we will improve the ability to optimize the search relevance of words and classify users. By doing so, we want to know the relationship between words and users and the categories of products that users are interested in. In addition, we will extend the word solution scenario to other fields. Since the search words represent the user demands (or traffic) and our ability is to accurately locate the products behind the words. The scenarios we will expand to mainly include: rankings in the site, opportunity products on SEO rankings outside the site, and PPC opportunity products.
As for a process that started from scratch, it requires a large number of inputs and discussions to find a data solution. Under the existing data capabilities and based on our pain points, we need to find a unified and interpretable solution that can be applied in AE from a large number of external data sources. We have a thorough discussion with business, algorithm, and engineering personnel on the business rationality of each detail and the potential risks of the data technology. Thanks very much for the cooperation of the whole group.
Alibaba Clouder - November 10, 2020
pangdaxing - December 24, 2019
Alibaba Clouder - November 9, 2018
Alibaba Clouder - April 4, 2018
Hiteshjethva - October 30, 2020
Alibaba Clouder - November 18, 2020
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.Learn More
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.Learn More
This all-in-one omnichannel data solution helps brand merchants formulate brand strategies, monitor brand operation, and increase customer base.Learn More
Alibaba Cloud e-commerce solutions offer a suite of cloud computing and big data services.Learn More
More Posts by AliCloud-Data Middle Office