All Products
Search
Document Center

OpenSearch:BM25F

Last Updated:Jan 09, 2024

Overview

The BM25F class is used to calculate the text relevance of a query term to multiple specified fields based on the BM25 class. You can specify different weights for the fields, based on which the BM25F class merges the BM25 scores of the query term in the fields. The following figures show the formulas that are used to calculate a BM25F score.12

Functions

Function

Description

BM25F create(OpsScorerInitParams params,CString indexName)

Creates a BM25F object based on the specified index. This function is a factory function.

BM25F create(OpsScorerInitParams params,CString indexName, CString[] fields)

Creates a BM25F object based on the specified index and specified fields in the index. This function is a factory function.

void setGroupScoreMergeOp(CString opName)

Sets an aggregation method for the BM25 scores of multiple query groups. Supported aggregation methods are sum and max. The default aggregation method is sum.

void setParamK(double paramK)

Assigns a value to the K parameter.

void setFieldParamB(CString fieldName, double paramB)

Assigns a value to the B parameter for the specified field.

void setTotalDocNum(long totalDocNum)

Specifies the total number of documents.

void setFieldAvgLength(CString fieldName, int avgFieldLength)

Specifies an average length for the specified field.

void setFieldWeight(CString fieldName, double fieldWeight)

Specifies a weight for the specified field.

double evaluate(OpsScoreParams params)

Calculates the proximity of a query term to the specified field.

Function details

BM25F create(OpsScorerInitParams params,CString indexName)

Creates a BM25F object based on the specified index. This constructor does not require you to specify a field. By default, all the fields in the specified index are involved in the BM25F score calculation. params: the parameters that are used for initialization. For more information, see OpsScorerInitParams. indexName: the name of an index. The name must be a constant.

BM25F create(OpsScorerInitParams params,CString indexName, CString[] fields)

Creates a BM25F object based on the specified index and a list of fields in the index. params: the parameters that are used for initialization. For more information, see OpsScorerInitParams. indexName: the name of an index. The name must be a constant. fields: the list of fields that are to be involved in the BM25F score calculation.

void setGroupScoreMergeOp(CString opName)

Sets an aggregation method for the BM25 scores of multiple query groups. Supported aggregation methods are sum and max. The default aggregation method is sum. This function can be invoked only during the initialization of a score calculation object. Query groups are generated after the original search query is processed by an analyzer. The default number of query groups is one. Parameter: opName: the method that is used to aggregate the BM25 scores of multiple query groups. Supported aggregation methods are max and sum.

void setParamK(double paramK)

Assigns a value to the 1 parameter. This function can be invoked only during the initialization of a score calculation object. Parameter: paramK: the value to be assigned to the 1 parameter. Default value: 2.0.

void setFieldParamB(CString fieldName, double paramB)

Assigns a value to the B parameter. This function can be invoked only during the initialization of a score calculation object. Parameters: fieldName: the name of a field in the specified index. If a list of fields are specified in the constructor of the current BM25F object, the value of the fieldName parameter must belong to the specified list and must be a string constant. paramB: the value to be assigned to the B parameter. Default value: 0.1.

void setTotalDocNum(long totalDocNum)

Specifies the total number of documents in an application. This function can be invoked only during the initialization of a score calculation object. Parameter: totalDocNum: the total number of documents. Default value: 92000000.

void setFieldAvgLength(CString fieldName, int avgFieldLength)

Specifies an average length for the specified field. This function can be invoked only during the initialization of a score calculation object. Parameters: fieldName: the name of a field in the specified index. If a list of fields are specified in the constructor of the current BM25F object, the value of the fieldName parameter must belong to the specified list and must be a string constant. avgFieldLength: the average length that you want to set for the specified field. Default value: 20.

void setFieldWeight(CString fieldName, double fieldWeight)

Specifies a weight for the specified field. This function can be invoked only during the initialization of a score calculation object. Parameters: fieldName: the name of a field in the specified index. If a list of fields are specified in the constructor of the current BM25F object, the value of the fieldName parameter must belong to the specified list and must be a string constant. fieldWeight: the weight that you want to set for the specified field. Default value: 1.0.

double evaluate(OpsScoreParams params)

Calculates the BM25F score of a query term in the specified fields. Parameter: params: the parameters that are used for score calculation. For more information, see OpsScoreParams. Return value: the BM25F score of the query term in the specified fields. Valid values: [0,1]. Sample code:

package users.scorer;
import com.aliyun.opensearch.cava.framework.OpsScoreParams;
import com.aliyun.opensearch.cava.framework.OpsScorerInitParams;
import com.aliyun.opensearch.cava.features.similarity.fieldmatch.BM25F;

class BasicSimilarityScorer {
    BM25F _f1;
    boolean init(OpsScorerInitParams params) {
        CString[] fields1 = {"title", "body"};
        _f1 = BM25F.create(params, "default", fields1);
        _f1.setFieldAvgLength("title", 10);
        _f1.setFieldWeight("title", 10D);
        _f1.setFieldParamB("title", 0.6);

        _f1.setFieldAvgLength("body", 100);
        _f1.setFieldWeight("body", 2D);
        _f1.setFieldParamB("body", 0.5);
        return true;
    }

    double score(OpsScoreParams params) {
        return _f1.evaluate(params);
    }
};