All Products
Search
Document Center

OpenSearch:Overview

Last Updated:Mar 03, 2023

What is category prediction?

The performance optimization of search engines is a big topic. In the query intent understanding stage, you can use optimization methods such as semantic understanding, named entity recognition (NER), term weight analysis, and spelling correction. In the sorting stage, you can use optimization methods such as text relevance analysis, popularity models, and category prediction. You can also configure query analysis rules, adjust sort expressions, and run A/B tests to compare the search performance of different optimization policies.

This topic describes category prediction. After you enter a search query, multiple commodities are found. The system calculates the relevance between the search query and the category of each commodity. Provided that the relevance is referenced in a sort expression, higher relevance indicates a higher sort score of a commodity. In this case, the commodity ranks higher.

For example, if the search query that you entered is Bright, the search results may contain the commodities of the milk category and the commodities of the rice category. You can train a category prediction model based on the following behavioral data: Among users who search for Bright, more users click the commodities of the milk category instead of the rice category. In this case, the model predicts that the relevance between Bright and the milk category is higher than that between Bright and the rice category. Therefore, when the system calculates the sort score of each commodity, the commodities of the milk category score higher than the commodities of the rice category. As a result, the commodities of the milk category rank higher than those of the rice category. The model predicts that the intent to search for Bright is more likely to find a commodity of the milk category. This increases the business value of searches.

Basic principles

The objective of category prediction is to predict the relevance between a search query and a specific category based on historical query data, behavioral data on clicks, and information about the commodities of the category. You can train a model based on historical search queries, behavioral data on clicks after searches, and information about the commodities of the category to describe the relevance between a search query and the category.

A data source that stores commodity data is required to train a model. When you create a category prediction model, you must first associate the model with an application. Then, you can determine the three types of data required to train the model.

Note

1. You can obtain all historical search queries on the application by adding the raw_query parameter in a search request.

2. You can obtain category data and commodity data by specifying fields in the application when you prepare for model training. You must specify at least the fields that record category IDs and commodity titles in the application.

3. You must add instrumentation to report the behavioral data on clicks on the application. A model with more features can be developed and can deliver better performance based on more comprehensive and higher-quality data that is reported.

OpenSearch provides two options for you. You can train the model with or without behavioral data.

Model training with behavioral data is suitable for scenarios in which behavioral data is uploaded. When the training starts, the entry conditions are automatically checked to ensure that the amount, quality, and integrity of the data meet the requirements. Take note of the steps that you can perform to train the model.

Note

1. Use historical search queries and the information about categories to generate sample data, and use behavioral data to label the sample data.

2. Generate click features by collecting index statistics and performing feature calculations on the behavioral data.

3. After the search queries and commodity titles of the categories are analyzed, calculate the semantic features of the text of the search queries and commodities titles.

4. If behavioral data on transactions is uploaded, collect index statistics and perform feature calculations on the behavioral data to generate transaction features. Such features are used for the model to rank the commodities that have better transaction performance higher.

5. The sample data and the behavior features, semantic features, transaction features, and labels of the sample data are combined as training data. Import the training data to the algorithm for iterative training.

6. After the training is complete, a model that describes the relevance between sample search queries and categories is obtained. Use this model to predict the relevance between a search query and a specific category.

Model training without behavioral data is suitable for scenarios in which behavioral data is not uploaded or the quality of behavioral data is unsatisfactory. You need to only specify the fields that record category IDs and commodity titles in the application to start training the model. In this case, no basis exists for labeling the sample data because no behavioral data is available. After the search queries and commodity titles of the categories are analyzed, calculate the semantic relevance between the text of the search queries and commodity titles, which is the relevance between the search queries and categories.

In theory, a model that is trained with behavioral data is better than a model that is trained without behavioral data. A model with more features describes more comprehensive relevance between search queries and categories, and the prediction of the model is more accurate.

Regardless of whether the model is trained with or without behavioral data, the training involves a large number of experiments, data from different scenarios, and careful parameter tuning to ensure that the results are as expected.

Use category prediction

Requirements

Before you train a model, you must prepare a data source and associate the model with an application. The category prediction model requires the category and commodity data, historical search entries, and behavioral data of the application.

If behavioral data on clicks is not uploaded, or you do not want to use the behavioral data on clicks for training, or the behavioral data on clicks does not meet the training conditions, you can train the model without behavioral data. In this case, you must select data of the following three fields from the application for the category prediction model: category ID, commodity title, and category name. The fields that record category IDs and commodity titles are required, and the field that records category names is optional. After the model training is complete, specific prediction results of the model are exported for performance evaluation. Category names are used on the performance evaluation page to manually evaluate whether the relevance between search queries and categories meets the expectations. We recommend that you select the field that records category names when you train the model.

If the behavioral data on clicks is uploaded, you can select the fields related to the behavioral data in addition to the preceding fields when you train the model. Provided that the behavioral data meets the training conditions, the data is used to train the model.

Procedure

  1. Create a category prediction model on an application.

  2. Apply the category prediction model: Apply the model to query analysis and then apply the model to rough and fine sorts.

  3. Create a query analysis rule, configure category prediction, and then select the model created in Step 1.

  4. Make the category prediction model take effect in a search request: Use an SDK to call operations and specify the raw_query parameter.

For more information, see Use category prediction.