All Products
Search
Document Center

OpenSearch:Use Cava to write sort scripts

Last Updated:Feb 06, 2023

Overview

OpenSearch supports sorts in two phases: rough sort and fine sort. Sort scripts written by using Cava can take effect only in fine sorts. This topic describes only how to write sort scripts by using Cava. For more information about how to create and use sort scripts, see API operations for Cava management. You can also create sort scripts by using the command-line tool provided by OpenSearch. In comparison with sort expressions, sort scripts written by using Cava are more flexible and customizable. You can implement your business logic by using the syntax supported by Cava and the feature libraries provided by OpenSearch in the scripts.

Classes in Cava sort scripts

To use custom scripts for score-based sorting, you must implement the score calculation class provided by OpenSearch. The following part shows the code of the score calculation class:

package users.scorer;
import com.aliyun.opensearch.cava.framework.OpsScoreParams;
import com.aliyun.opensearch.cava.framework.OpsScorerInitParams;

class BasicSimilarityScorer {
    // You can define member variables.
    boolean init(OpsScorerInitParams params) {
        // Implement your code and initialize the variables related to a search request, such as the member variables of the class.
        return true;
    }

    double score(OpsScoreParams params) {
        double score = 0;
        // Implement your code and assign the score result to the score parameter.
        return score;
    }
};

The BasicSimilarityScorer class is in the users.scorer package. You cannot modify the class name or package name. Otherwise, an error is reported during compilation. The BasicSimilarityScorer class provides the init and score methods. You can use them to implement your business logic.

For each search request, OpenSearch first calls the init method of the BasicSimilarityScorer class to initialize the variables related to the search request, such as the member variables of the class. The init method is called only once for each search request. If the init method fails to be called, an error response is returned, and the search request is terminated. Then, OpenSearch calls the score method to calculate the score of each document that is hit and involved in the fine sort, and sorts the documents based on the scores.

The input parameter of the init method is OpsScorerInitParams. You can use this parameter to obtain the information about a search request. We recommend that you initialize the member variables of the BasicSimilarityScorer class by using the init method.

The input parameter of the score method is OpsScoreParams. You can use this parameter to obtain the information about a search request and a document. The score method is called once for each document whose score is to be calculated. Therefore, when you call the score method, we recommend that you do not obtain the information about a search request, such as specific parameters in the kvpairs clause and specific feature objects. You can obtain the information about a search request in the init method. You can obtain the information about a document in the score method.

You cannot modify the function definitions of the init and score methods. For example, you cannot change the types of return values or the input parameters. Otherwise, an error is reported during compilation.

Sample sort scripts in Cava

package users.scorer;
import cava.lang.CString;
import com.aliyun.opensearch.cava.framework.OpsScoreParams;
import com.aliyun.opensearch.cava.framework.OpsScorerInitParams;
import com.aliyun.opensearch.cava.framework.OpsRequest;
import com.aliyun.opensearch.cava.framework.OpsKvPairs;
import com.aliyun.opensearch.cava.framework.OpsDoc;
import com.aliyun.opensearch.cava.features.similarity.TextRelevance; // Reference the features to be used.
class BasicSimilarityScorer {
    TextRelevance _textRelevance; // Define the scoring feature as a member variable.

    boolean init(OpsScorerInitParams params) {
        if (!params.getDoc().requireAttribute("shop_margin")) { // The attribute fields used during score calculation. The attribute fields must be declared in the init method.
            return false;
        }
        _textRelevance = TextRelevance.create(params, "default", "name"); // The scoring feature, which is declared in the init method.
        return true;
    }

    double score(OpsScoreParams params) {
        float shopMargin = params.getDoc().docFieldFloat("shop_margin"); // Obtain the values of the field in the documents.
        float textScore = _textRelevance.evaluate(params); // Calculate the feature scores.
        double score = textScore * 30.0 + shopMargin;
        return score;
    }
}

Usage notes

  • We recommend that you define the scoring feature provided by OpenSearch as a member variable of the BasicSimilarityScorer class, initialize the scoring feature in the init method, and calculate scores in the score method. If the scoring feature is initialized in the score method, great performance waste is caused.

  • We recommend that you define a custom parameter in the search request as a member variable of the BasicSimilarityScorer class and obtain the value of the parameter in the init method.

  • If you need to obtain a field in a document, the field must be defined as an attribute field in the application schema. In this case, you can declare the field in the init method and obtain the value of the field in the score method.

  • You can define classes only in the users.scorer package because you can upload only a single file.

  • You can use the import syntax to reference the system libraries provided by OpenSearch. The import com.aliyun.opensearch.cava.framework.*; syntax is not supported.

  • For a single search request, the maximum memory that can be used when the sort scripts run is 40 MB. If the memory usage exceeds the limit, OpenSearch reports an error and returns the result for the search request. Therefore, do not perform operations that require large memory in the scripts. In particular, do not frequently use the new statement in the score method or use a large number of strings. For requests whose memory usage exceeds the limit, OpenSearch reports an error and also returns the results whenever possible. In the results, only part of the documents are sorted based on scores.

  • You can use for loops or invoke functions in the sort scripts. To prevent infinite loops and unlimited function invocations, OpenSearch limits the number of for loops and function invocations in the sort scripts to no more than 100,000 for a single search request. If the limit is exceeded, an error is reported and the result is returned in advance for the search request.

  • You can configure the rerank_size parameter to adjust the number of documents that you want to sort by using sort scripts. This helps prevent memory usage and the number of loops from exceeding the limits.