What is category prediction?
The effect optimization of search engines is a big topic. In the query intent understanding stage, you can use optimization methods such as semantic understanding, named entity recognition (NER), term weight analysis, and spelling correction. In the sorting stage, you can use optimization methods such as text relevance analysis, popularity models, and category prediction. You can also configure query analysis rules, adjust sort expressions, and run A/B tests to compare the search effects of different optimization policies.
This topic describes category prediction. After a user enters a search query, multiple commodities are found. The system calculates the relevance between the search query and the category of each commodity. Provided that the relevance is referenced in the corresponding sort expression, the higher the relevance, the higher the sort score of the commodity. In this case, the commodity ranks higher.
For example, if the search query that a user entered is Bright, the search results may contain commodities of the milk category and the rice category. You can train a category prediction model based on the following behavioral data: Among users who search for Bright, more users click the commodities of the milk category than the rice category. In this case, the model predicts that the relevance between Bright and the milk category is higher than that between Bright and the rice category. Therefore, when the system calculates the sort score of each commodity, the commodities of the milk category score higher than the commodities of the rice category. As a result, the commodities of the milk category rank higher than those of the rice category. The model predicts that the intent of the user to search for Bright is more likely to find a commodity of the milk category. This increases the business value of searches.
The objective of category prediction is to predict the relevance between a search query and a specific category based on historical query data, click behavioral data, and the information about the commodities of the category. You can train a model based on historical search queries, click behavioral data after searches, and the information about the commodities of the category to describe the relevance between the search query and the category.
A data source that stores commodity data is required to train a model. When you create a category prediction model, you must first associate the model with an application. Then, you can determine the three types of data required to train the model.
1.All historical search queries on the application can be obtained from the log data in the background of the application.
2.You can obtain category data and commodity data by specifying fields in the application when you prepare for model training. You must specify at least the category ID field and commodity title field in the application.
3. You must add instrumentation to report the click behavioral data on the application. The more comprehensive the reported data, the higher the quality.
OpenSearch provides two options for you. You can train the model with or without behavioral data.
Model training with behavioral data is suitable for scenarios where behavioral data is uploaded. When the training starts, the entry conditions are automatically checked to ensure that the amount, quality, and integrity of the data meet the requirements. Perform the following steps to train the model:
1.Use historical search queries and the information about categories to generate sample data, and use behavioral data to label the sample data.
2.Generate click behavior features by collecting index statistics and perform feature calculation on the behavioral data.
3.After the search queries and commodity titles of the categories are analyzed, calculate the semantic features of the text of the search queries and commodities titles.
4.If transaction behavioral data is uploaded, collect index statistics and perform feature calculation on the transaction behavioral data to generate transaction features. Such features are used for the model to rank the commodities that have better transaction performance higher.
5.The sample data and the behavior features, semantic features, transaction features, and labels of the sample data are combined as training data. Import them to the algorithm for iterative training.
6.After the training is complete, a model that describes the relevance between sample search queries and categories is obtained. Use this model to predict the relevance between an unknown search query and a specific category.
Model training without behavioral data is suitable for scenarios where behavioral data is not uploaded or the quality of behavioral data is poor. You need only to specify the category ID field and commodity title field in the application to start training the model. In this case, no basis exists for labeling the sample data because no behavioral data exists. After the search queries and commodity titles of the categories are analyzed, calculate the semantic relevance between the text of the search queries and commodity titles, which is the relevance between the search queries and categories.
In theory, a model that is trained with behavioral data is better than a model that is trained without behavioral data. A model with more features describes more comprehensive relevance between search queries and categories, and the prediction of the model is more accurate.
Regardless of whether the model is trained with or without behavioral data, the training involves a large number of experiments, data from different scenarios, and careful parameter tuning to ensure that the results are as expected.
Use category prediction
Model training requires a data source to generate sample data. The sample data is used to perform feature calculation and train the model. Therefore, before you train the model, you must associate the model with an application. After you associate the model with the application, application data, search query data of the application, and behavioral data of the application can be used. All the data is required by the category prediction model.
If click behavioral data is not uploaded, or you do not want to use the click behavioral data for training, or the click behavioral data does not meet the entry conditions of training, you can train the model without the click behavioral data. In this case, you need to select data of the following three fields from the application for the category prediction model: category ID, commodity title, and category name. The category ID and commodity title are required and the category name is optional. After the model training is complete, specific prediction results of the model are exported for manual evaluation. The category name is used on the effect evaluation page to manually evaluate whether the relevance between search queries and categories meets the expectation. We recommend that you select the category name field when you train the model.
If the click behavioral data is uploaded, you can select the fields related to the behavioral data in addition to the preceding fields when you train the model. Provided that the behavioral data meets the entry conditions of the training, the data is used to train the model in the background.
Create a category prediction model on an application.
Apply the category prediction model: Apply the model in query analysis and then apply the model in rough and fine sorts.
Create a query analysis rule, configure category prediction, and select the model created in Step 1.
Make the category prediction model take effect in the query: Use SDKs to call operations that query information and enter the raw_query parameter.
For more information, see Use the category prediction feature.