What is a popularity model?
Popularity models are based on offline computation. They use sort algorithms and serve as the basis of search services in Taobao. A popularity model calculates and quantizes the static quality and popularity of each commodity into a value, which is called the popularity score. Introduced from services in Taobao, popularity models also apply to other search scenarios, such as the non-commodity search scenario, where the popularity of indexed documents can be calculated by the models.
Features used for model training
Entity dimension: product or document, brand, merchant, leaf categories, and level 1 categories.
Time period dimension: 1 day, 3 days, 7 days, 14 days, 30 days, and time decay weighting.
Behavior dimension: exposure, click, add to favorites, add to cart, purchase, comment, and likes.
Statistics dimension: count, number of customers, frequency, click-through rate, and conversion rate.
You can define a feature by using one to two items in each dimension. A popularity model calculates the value of the feature based on historical data. For example, a feature can be the number of times (statistics) a commodity (entity) is exposed (behavior) within the last day (time period), or the sales (behavior and statistics) of the merchant (entity) to which a product belongs within the last 30 days (time period). The total number of features that can be defined in this way is the cartesian product of the four dimensions.
Create a popularity model.
Train the model and check the data report.
Apply the popularity score to a sorting policy.
Note: Each application supports a maximum of five popularity models.
Create a popularity model
1.In the OpenSearch console, choose Search Algorithm Center > Sort Configuration. On the Policy Management page, click Popularity Model Management in the left-side pane. On the Popularity Model Management page, click Create. On the Create Popularity Model page, set the required parameters in the Basic Information step.
2.In the Data Source Configuration step, activate the server-side collection feature.
3.After the server-side collection feature is activated, the following information appears:
Status: the status of the behavior data that is collected on the server side. If the data is determined as abnormal in data verification, Abnormal and Unavailable appears, as shown in the following figure. You can click View Data Report to view the detailed information on the Data Report page. If the data is determined as normal in data verification, Normal and Available appears.
Last Updated On: the date when the behavior data is last updated.
Latest Updated Records: the number of original data records that are last updated.
Activated On: the date when the server-side collection feature is activated for the first time.
Total Behavior Data Records: the total number of behavior data records. All updated and deleted records are counted.
Case 1: The collected behavior data is in the Abnormal and Unavailable state. You cannot continue to create the popularity model.
Case 2: The collected behavior data is in the Normal and Available state. You can continue to create the popularity model.
4.Complete the creation.
5.After the popularity model is created, you can perform an A/B test on the model and evaluate its performance. You can also publish the model to calculate the popularity score and apply the popularity score to a sorting policy.
Note: The value of the popularity() function is from 0 to 1.
Details of a popularity model
Popularity Model Management
Model Name: the name of the popularity model that you specify when you create the model.
Objective: the objective of the popularity model that you specify when you create the model. Only the objective value of the click-through rate is supported.
Available Model: indicates whether an available model is generated after the latest training and whether the area under a curve (AUC) value exceeds the specified threshold. If an available model is generated and the AUC value exceeds the threshold, Yes appears in this column. Otherwise, No appears.
Latest Training Time: the date when the model is last trained.
Latest Training Status: the status of the latest training. When behavior data is imported for the first time or incremental data is imported, the data will be verified. If the data passes the verification, the Pending Training state appears. Otherwise, the Data Exception state appears. Note that the Data Exception state appears no matter whether the full behavior data or the incremental data fails to pass the verification. After you start training for a popularity model that is in the Pending Training state, the status changes to Training. If the training failed, the status changes to Training Failed. If the training succeeds and the AUC value is greater than 0.8, the status changes to Trained and Passed. If the training succeeds but the AUC value is smaller than 0.8, the status changes to Trained but Failed.
AUC of Latest Training: the AUC value of the latest training. Only popularity models whose latest training is completed have an AUC value. If the latest training of a popularity model is not complete or failed, no AUC value is generated for the training. In this case, a hyphen (-) appears in this column.
Sort: indicates whether the popularity model is referenced by a sort policy. If the popularity model is referenced by a sort policy, Referenced appears in this column. If the popularity model is not referenced by a sort policy, Not Referenced appears in this column. If no available model is generated after the latest training of the popularity model, a hyphen (-) appears in this column.
Details page of a popularity model
Available: indicates whether an available model is generated after the latest training of the popularity model. If an available model is generated, Available appears in this column. Otherwise, Unavailable appears in this column.
Last Trained On: the date when the model is last trained.
View Data Report: allows you to view the report on data quality.
Data Source: the data source that you select when you create the popularity model. You can select only Server-side Collection.
Objective: the objective of the popularity model that you specify when you create the model. Only the objective value of the click-through rate is supported. OpenSearch does not allow you to change the objective value for popularity models because this value is critical for determining the scenario type.
Scheduled Training: indicates whether scheduled training is enabled. Scheduled training can be performed regardless of the status of the popularity model. You can configure scheduled training at any time.
Referenced: indicates whether the popularity model is referenced by a sort policy. This information appears only when an available model is generated after the latest training of the popularity model. You can click Sort Policy on the right to go to the Policy Management page to reference the popularity model in a sort policy.