Overview
Click-through rate (CTR) prediction is a core task of a search platform. This task predicts the possibility that a user clicks the documents that match a specific query of the user after the documents are exposed. The predicted value can be used in sorting scripts to improve search performance and improve business metrics such as CTR.
Benefits
OpenSearch supports CTR prediction models to better meet the needs of search result sorting in various scenarios. You can create and train CTR prediction models to implement personalized sorting of search results.
Create and train a model
Create an Industry Algorithm Edition configurations. Then, log on to the OpenSearch console and choose Search Algorithm Center > Sort Configuration in the left-side navigation pane. On the Policy Management page, click CTR Prediction Models in the left-side pane. On the CTR Prediction Models page, click Create to create a CTR prediction model.

Enter a model name and specify the training fields.

Map Training Fields: The Commodity ID and Commodity Title fields are required. The more fields you specify, the better the performance of the model is.

After the model is created, find the model on the CTR Prediction Models page and click Train in the Actions column.

After the training starts, view the training progress on the model details page.

After the training is complete and the model status changes to Available, use the model. If the model status changes to Unavailable, adjust the model based on the condition for integrity level upgrade in the Data Verification section. After the model is adjusted to meet the condition, train the model the next day. If you have any questions, submit a ticket to contact technical support.

We recommend that you enable scheduled training to train the model daily.
Perform a search test
In the left-side navigation pane, choose Search Algorithm Center >Sort Configuration >Policy Management. On the Sort Configuration page, click Create to create a Cava-based fine sort policy.

Configure the Policy Name parameter, select Fine Sort from the Scope drop-down list, select Cava Script from the Type drop-down list, and then click Next.

Click Add Script File and copy the sample Cava script to the script editor. Click Compile. If the compilation is successful, click Save and then Publish.
Then, you can perform a search test.


Sample Cava script:
package users.scorer;
import com.aliyun.opensearch.cava.framework.OpsScoreParams;
import com.aliyun.opensearch.cava.framework.OpsScorerInitParams;
import com.aliyun.opensearch.cava.framework.OpsRequest;
import com.aliyun.opensearch.cava.framework.OpsDoc;
import com.aliyun.opensearch.cava.features.algo.AlgoModel;
class BasicSimilarityScorer {
boolean init(OpsScorerInitParams params) {
return true;
}
double score(OpsScoreParams params) {
double score = 0;
return score;
}
};
class IntelligenceAlgorithmScorer {
AlgoModel _algoModel;
boolean init(OpsScorerInitParams params) {
// The tf_checkpoint parameter is a fixed parameter.
_algoModel = AlgoModel.create(params, "tf_checkpoint","ctr", "Name of your CTR prediction model");
return true;
}
double score(OpsScoreParams params) {
OpsDoc doc = params.getDoc();
double modelScore = _algoModel.evaluate(params);
doc.trace("ctrModelScore: ", modelScore);
double score = modelScore + 700;
return score;
}
};Perform a search test.

Note:
The second_rank_type, second_rank_name, and raw_query parameters are required in search requests.
If the user_id parameter is contained in both behavioral data and queries, the performance of the model is better.
Model details page
Basic Information
You can view the following basic information about the model: Created At, Status, Last Training Time, and Latest Version Status.

Configuration Information
Training Fields: After you click Map Training Fields, you can modify or delete training fields in the Map Training Fields panel. After you modify training fields, you must retrain the model.

Scheduled Training: By default, scheduled training is enabled to train the model daily. You can also modify the scheduled training task to customize the training cycle.

Data Verification
Valid values of Data Integrity: Available Data and Abnormal Data.
The integrity report displays the integrity level of the current application. The following table describes the integrity levels.
Integrity level | Description | Upgrade condition |
l0 | The data is completely unavailable. Required core fields are missing, and the size of data is too small. Therefore, subsequent data processing cannot be performed. | l0 --> l1:
|
l1 | The core fields of the data are configured and meet the most basic requirements. However, the size of behavioral data is small, and some fields are missing. Optimization that does not rely on behavioral data can be performed. Issues of the behavioral data must be resolved to perform comprehensive optimization. | l1 --> l2:
|
l2 | The data quality meets the requirements and subsequent optimization can be performed. However, the data size is small. This has a certain impact on the final optimization result. | l2 --> l3:
|
l3 | Both the data quality and data size meet the requirements and optimization can be performed. | l3 --> l4:
|
l4 | The data size is large and contains tens of millions of data entries. The data integrity is great. | l4 --> l5:
|
l5 | The data size is very large and contains more than hundreds of millions of data entries. Deep optimization can be performed. |
The number of IPVs indicates the CTR of each search. In this case, the value of the bhv_type field is click.
If exposures are reported and the number of exposures is greater than the number of IPVs, the number of behavioral data entries that contain bhv_type=expose is greater than the number of behavioral data entries that contain bhv_type=click. If a user clicks a product, the product is exposed. Therefore, the behavior data needs to be uploaded twice. One data entry contains bhv_type=expose and the other data entry contains bhv_type=click.
Usage notes
You can use CTR prediction models only in Cava-based plug-ins.
This feature is available only for Industry Algorithm Edition - Dedicated Cluster instances.
Each application supports up to three CTR prediction models.
The more training fields, the better the model training result.
The raw_query field in upgrade conditions is a required field in search requests. The value of the field must be a unique and independent search query that has search results. For more information, see SDK for Java demo code for implementing the search feature.
Related API operations and SDKs: Algorithms.