Community Blog Opportunity Discovery and Application of Keywords by AliExpress

Opportunity Discovery and Application of Keywords by AliExpress

his article discusses how AliExpress discovers opportunities intelligently through intensive keyword data analysis.


AliExpress (AE) is an e-commerce platform that mainly focuses on cross-border users. It has been a time-consuming and laborious pain point for AliExpress to obtain the consumption intentions and consumption trends of local users. This is further corroborated by the challenges of data privacy and security policies for overseas users. In addition to the direct data purchase behaviors on AliExpress, e-commerce queries extracted from leading search engines like Google are closer to the needs of local users. These queries directly reflect the unbiased user mind in a country and directly analyze user words to present products on the platform.

In 2021, our supply-side data science team is aiming to create more opportunities globally on the basis of data intelligence, both internally and externally. We have developed a comprehensive solution to keyword opportunities in market insight on the basis of queries inside and outside the site and by combining competitive data from Amazon and Cdiscount.

This time, through deep cooperation with the Spanish business/technical team, we have performed multiple trial runs of the solution and fully discussed and solved the potential detail problems. We have built our keyword data capability from scratch and have introduced the module of keyword opportunity products in the data bank of the platform. This product module aims to guide us to answer how to provide the "right product" at the "right time" in the "right way" from the perspective of localization. It is our first attempt to jointly drive business by data and products on the basis of insight into the micro-data of the market.

Data Capacity Building

The building of data capabilities are divided into four modules: word source ability building, word processing capability (word normalization, named entity recognition, and trend predication), algorithm for word selection of opportunity products, and word diagnosis. The difficulty of keyword capacity building does not lie in a specific module but in the design of the overall data solution framework. The main problem is to get more comprehensive word interpretations by obtaining and integrating other external data sources according to the information limitation based on keywords. Thus, products that meet users' immediate needs can be found. In addition, we need to take into consideration the actual supply situation in AE and create product sets that can be flexibly used in different scenarios.

A Review of the Overall Data Production Process


  • We are not able to meet the requirements of localized operational scenarios by just depending on search words within AE. For example, the queries in AE lag behind requirements in Google.
  • Previously, there was the function of directly calling top search words outside the site to the AE search engine to obtain the product set of marketing calendars. However, there was no further explanation or selection of the queries. As a result, the products selected through some non-value words failed to achieve the desired business effect.

To explore the real needs of users, we have designed word solutions with more external data sources. The highlights of the overall process include:

Based on Ali Translation and ALiNLP, it conducts word normalization and named-entity recognition for queries inside and outside the site. Thus, the ability of operation personnel to identify effective words is improved.

It introduces Google Trend and UV inside the site to identify the trends in the off-season and peak season of the search words.

It introduces search results from competing websites Amazon and Cdiscount and matches them with products with the same style inside the site. It quickly monitors and operates the information of products outside the site.

It makes product selection specified and efficient by combining AE and matching products with the same style.

It gets through the downstream delivery process of search recommendation (SS) to directly assist operation personnel in sales tests and product posting.


Word Source Ability

At present, there are two kinds of data sources inside and outside the site: top search words in AE (daily UV > 50) and Google top search keywords of e-commerce (Google updates top search keywords of e-commerce in each country every week). In addition, AE has access to word sources from Ahref, a third-party keyword search tool, and puts words in its project of SEO local words for external traffic acquisition.


Note: After the words are accessed, the results of category prediction will be marked. Currently, through business sample inspections, the category relevance accuracy of words can reach 87.6%.

Word Normalization

Objective: To clean synonyms from multiple resources inside and outside the site. Stemming basic logic: Source words (case requirement/spaces/special symbols removed) > translate words into English (Ali) > perform tokenization on English words through algorithm > aggregate and sort out words according to the result (sorting logic: UV_cnts within one day, indicator: word_rank) > take the source word with the highest ranking as the stemming word in the source language.

Method: Clustering the same stem results based on STEM and Ranking to obtain the TOP 1 source language (SEO application has no ranking abundancy requirement). For example, the following table is obtained from the Google-ES and normalization of words in the site. Take "man shoe" as an example. The following table will normalize multiple similar queries to "man shoe" and finally obtain the query "men shoes, uv_1d:2285875".


Word Attribute Extraction

Objective: It aims at parsing the semantics of queries to provide more information to operation personnel. Thus, different applications can be developed for different scenarios.

Method: It recognizes different word attributes based on NER and establishes a classification model to further mark words in combination with the operation and application scenarios. Currently, the words and classifications that can be recognized are as follows. After industry evaluation and the optimization of word attribute extraction, the evaluation accuracy has increased from 85.7% to 90%


Word Shielding Mechanism

Due to the special nature of language, we may have some errors in marking e-commerce words. Therefore, we use a semi-artificial and semi-algorithmic recognition mode. We have set operation blacklist mechanism on products. Based on business feedback words and category-related bad cases, some keywords can be removed from products, and the model can be optimized.

Word Trend Prediction

Objective: It aims at predicting the peak weeks of words and arranging the operation in advance

Method: It recognizes the peak season and off-season of words on the basis of the past data sources inside and outside the site. Based on the research on the data of Google, we add the similar public data source Google Trend to supplement it. As for the word trend prediction, we compare the week-on-week and year-on-year data of Google Trend and the data of Google BI from the same time. This can enable multiple external data sources to integrate and enrich our information. In addition, this can exclude the influence of word trend prediction during promotions, making the prediction more accurate.


Ability of Opportunity Product Selection through Words

Method: The current training materials for product selection through words come from two data sources: research words from sources on the site and research words in competing websites (data sources from Amazon and Cdiscount). Crawling search words in competing websites and matching products with the same style are important innovations of the ability of opportunity product selection through words. Based on our understanding of AE business and competing websites, we take into account the advantages of competing websites. They have a greater volume of product transactions, more comprehensive user distribution, and more localized merchants. Thus, we believe:

  • Compared with AE, there are more localized hot products in recall products of competing search engines. They can be a supplement to our localization of opportunity discovery.
  • The sales of competing trend products and CPV features can be used to build a model for our product selection and provide feature input or product diagnosis.


Therefore, we build an algorithm process:

In the process:

Exploration model: It recommends related products through the Swing algorithm on the basis of hot products inside and outside the site.

Prediction model: The goal of the business is to cultivate a west-oriented product pool and drive opportunity products through good products so that local users can gradually form a mind of shopping. Therefore, we take product dpvCTR as the optimization goal. On this basis, we use the CTR prediction models, including LR, LightGBM, and DeepFM.

Refined selection of products with the same style: Select better products in the recalled products according to DSR, price, and logistics, respectively.

Word Diagnosis

Objective: It aims at diagnosing, classifying, and marking keyword opportunities in advance on the basis of insight into BA business and business operation experience. As a result, we can efficiently screen opportunity queries that are truly valuable.

Method: In addition to common indicators in the site, we mainly give the data evaluation indicators outside the site, such as the UV index of searches outside the site, transaction indicators on competing websites (number of comments * price modeling). These indicators offer guidance on the popularity of business outside the site and the transaction situation outside the site.

Indicator criteria are divided into the following categories:


The indicator criteria vary according to different business needs. Each potential business party can process its own criteria by offering them to the data team and then apply them in the opportunity center products.

Business Results

Product Campaign in Spain

So far, this capability has been successfully applied in the POC scenarios of the product campaign in Spain, and its value has been verified. This capability can be quickly reused in the supply of trend goods in France, Russia, and the United Kingdom. The scenario value verified in Spain includes:

  1. Collection topic exploring capability: Among the eight key themes that have been operated in the product campaign, Christmas, Valentine's Day, and skiing season were found from the data. The Valentine's Day theme (4.9% click rate) and the skiing theme of winter clothes clearance sales (5.2% click rate) make up two of the top three themes.
  2. The Valentine's Day theme POC (from exploring 500 Valentine's Day hot search words outside the site to crawling products on competing websites and to matching products with the same style on the site) has two contributions to the market:
  3. Expansion of west-oriented premium products pool: A total of 8,960 products are obtained by matching products through 500 keywords outside the site. Among them, 3,085 products were not included in the premium products pool, among which 1,437 products are added into it after the trial sales.
  4. Improvement of the product click rate through Collection ads: After one week of advertising, the click rate of Collection stabilizes at about 5%, which is 20% higher than the average rate in the Spanish market (4.15%).
  5. Christmas POC: We organize about 90,000 products with the same style as those on competing websites and add 30,000 products to the premium products pool. We have advertised Christmas comprehensive collection, Christmas clothing collection, and Christmas decoration collection. The total number of products sold in 10 days is about 12,000.
  6. With the continuous improvement of word solutions and product capabilities, starting from users' needs, we have launched a series of operational strategies for business, for example, association and application in search scenarios and program of cell operation intelligence. We truly achieve the idea of driving business growth through the combination of data and technology.


Application Results of SEO Local Word Library

We have cooperated with the SEO team to build a local word library and have crawled the third-party keyword search tool Ahref. We have also conducted an application experiment in SEO. The user needs are traffic both inside and outside the site, and they are free of charge. Based on the data used over the past week, the performance of Ahref is as follows:

  1. Ahref word sources have brought 31,669 UV to AE on average every day (the overall SEO word library has brought 317,736 UV), accounting for about 10% of the total.
  2. Ahref word sources have an average of 141 new buyers per day, with an average of 1,459 orders per day.
  • The IPVUV value of Ahref word sources is about twice that of all word sources (all word sources: $0.59, Ahref: $1.12). The explored hot words outside the site are obviously better than the words recorded in AE.


Application Results on Rankings:

In March, we worked with the ranking business and operation personnel to build the ranking capability in the site, and we mainly optimized the word-based product selection solution in two aspects:

  1. Word selection: We focus on CPV words with specific categories of style and material (temporarily defined as CPV words), not just treat them as separate category words.
  2. Product selection: Material words, style words, and categories have high requirements on whether the products found through words can reflect the meaning of the words. We have improved the accuracy of product selection in the overall process.

The process was launched on March 28, with 94 keywords evaluated and 200 rankings established. The meaning of POC is to complete the automatic ranking building mechanism and to improve the efficiency of operating personnel in discovering local opportunity words, product matching, and product posting. For example, the following tag bank accumulates keyword trend tags.


Product Display

Even if we have obtained relevant data sources from the outside and know what the trend is, there may be no good tools inside the site to reflect the trend immediately. Human resource waste will greatly increase in repeatedly searching for BI prior products or fishing product titles from keywords to conduct prismatic application. There will also be the problem of inaccurate recalls of products. After several times POC, we think we need a platform that can give detailed keyword information and provide quick delivery of accurate word-based opportunity product selection capability. It is an indispensable tool for AE to capture trends outside the site.

Interface of opportunity word exploring: Based on the search ranking of keywords, historical trends, and conversion of L-D inside the site, operation personnel determines whether to create a product set for the application. The opportunity type is a BI definition indicator, which can help operation personnel quickly find effective keywords.


Interface of product set creation based on opportunity words: Products are viewed to form product sets on the basis of business decisions. The product sets have been applied to the search recommendation scenarios. They are implemented to build different premium product pools in different countries and are also used for the personalized album release (this word process module was launched at the opportunity center at the end of this month).


The products can automatically capture the following requirements:

1) Periodic hot spots

  • Summer clothing: T-shirt, dress, summer dress, bikini, sandals, skirt, sunglasses, top, pants
  • Outdoor sports: bicycle, mountain bike
  • Summer home: curtains, family swimming pool

2) Soaring hot spots in the site

  • Marketing-related: Mi Smart Band 6, POCO F3, Poco X3 Pro
  • Summer clothing: swimsuit man, shorts
  • Outdoor activities: fishing
  • Home: ceiling lamp, outdoor solar lights, barbecue

3) Soaring hot spots outside the site

  • Folk festivals: wedding dress, engagement rings, marriage bedrooms
  • Outdoor activities and devices: barbecue, swimming pool, garden furniture

Recent application effects: the keyword click rate of ES is 30%-50% higher than that of the overall market.


Future Plans

At present, we are further improving our capability of word solutions from two directions: capability building and scenario extension. In order to build more accurate product selection based on words in the future, we will improve the ability to optimize the search relevance of words and classify users. By doing so, we want to know the relationship between words and users and the categories of products that users are interested in. In addition, we will extend the word solution scenario to other fields. Since the search words represent the user demands (or traffic) and our ability is to accurately locate the products behind the words. The scenarios we will expand to mainly include: rankings in the site, opportunity products on SEO rankings outside the site, and PPC opportunity products.

As for a process that started from scratch, it requires a large number of inputs and discussions to find a data solution. Under the existing data capabilities and based on our pain points, we need to find a unified and interpretable solution that can be applied in AE from a large number of external data sources. We have a thorough discussion with business, algorithm, and engineering personnel on the business rationality of each detail and the potential risks of the data technology. Thanks very much for the cooperation of the whole group.

0 0 0
Share on

You may also like


Related Products