All Products
Search
Document Center

OpenSearch:BM25

Last Updated:Feb 06, 2023

Overview

The BM25 class is used to calculate the BM25 score of a query term based on a specific field. The BM25 score indicates the relevance of the query term to the specified field. The following formula shows how to calculate a BM25 score:

imagetf(i) indicates the number of occurrences of the query term in the specified field.

idf(i) indicates the inverse document frequency (IDF) of the query term.

f(ngram) = ngram > 1 ? ngram*C : 1

K: the factor that adjusts the effect of the term frequency.

C: the weight of the retrieval unit. B: the factor that adjusts the effect of the length of the specified field. The BM25 class provides multiple functions that are used to adjust the effects of different parameters in the calculation formula. This allows you to customize the BM25 class based on your needs.

Functions

Function

Description

BM25 create(OpsScorerInitParams params, CString indexName, CString fieldName)

Creates a BM25 object.

void setGroupScoreMergeOp(CString opName)

Sets an aggregation method for the BM25 scores of multiple query groups. Supported aggregation methods are sum and max. The default aggregation method is sum.

void setParamK(double paramK)

Assigns a value to the K parameter.

void setParamB(double paramB)

Assigns a value to the B parameter.

void setParamC(double paramC)

Assigns a value to the C parameter.

void setFieldAvgLength(double avgFieldLength)

Specifies an average field length.

double evaluate(OpsScoreParams params)

Calculates the proximity of the query term in the specified field.

Function details

BM25 create(OpsScorerInitParams params, CString indexName, CString fieldName)

Creates a BM25 object based on a specific index and a specific field. Parameters: params: the parameters that are used for initialization. For more information, see OpsScorerInitParams. indexName: the name of an index. The name must be a constant. fieldName: the name of a field in the specified index. The name must be a constant. The field must be of the TEXT or SHORT_TEXT type. The analyzer can be the general analyzer for Chinese, a custom analyzer, the single character analyzer for Chinese, the analyzer for English, or the analyzer for fuzzy searches.

void setGroupScoreMergeOp(CString opName)

Sets an aggregation method for the BM25 scores of multiple query groups. Supported aggregation methods are max and sum. The default aggregation method is sum. This function can be called only during the initialization of a score calculation object. Query groups are generated after the original search query is processed by an analyzer. By default, only one query group exists. Parameter: opName: the method that is used to aggregate the BM25 scores of multiple query groups. Supported aggregation methods are max and sum.

void setParamK(double paramK)

Assigns a value to the K parameter. This function can be called only during the initialization of a score calculation object. Parameter: paramK: the value to be assigned to the K parameter. Default value: 2.0.

void setParamB(double paramB)

Assigns a value to the B parameter. This function can be called only during the initialization of a score calculation object. Parameter: paramB: the value to be assigned to the B parameter. Default value: 0.1.

void setParamC(double paramC)

Assigns a value to the C parameter. This function can be called only during the initialization of a score calculation object. Parameter: paramC: the value to be assigned to the C parameter. Default value: 0.7.

void setFieldAvgLength(double avgFieldLength)

Specifies an average field length. This function can be called only during the initialization of a score calculation object. Parameter: avgFieldLength: the average field length that you want to set. Default value: 20.

double evaluate(OpsScoreParams params)

Calculates the BM25 score of the query term in the specified field of the specified index. Parameter: params: the parameters that are used for score calculation. For more information, see OpsScoreParams. Return value: the BM25 score of the query term in the specified field. Valid values: [0,1]. Sample code:

package users.scorer;
import com.aliyun.opensearch.cava.framework.OpsScoreParams;
import com.aliyun.opensearch.cava.framework.OpsScorerInitParams;
import com.aliyun.opensearch.cava.features.similarity.fieldmatch.BM25;

class BasicSimilarityScorer {
    BM25 _f1;
    boolean init(OpsScorerInitParams params) {
        _f1 = BM25.create(params, "text_index", "text");
        _f1.setParamK(3);
        _f1.setParamB(0.5);
        _f1.setParamC(1.2);
        _f1.setFieldAvgLength(30);
        return true;
    }

    double score(OpsScoreParams params) {
        return _f1.evaluate(params);
    }
};