Policy Management - Rough sort functions - OpenSearch - Alibaba Cloud Documentation Center

This topic describes the functions available for rough sort.

Overview

A rough sort is a preliminary filtering stage. It quickly identifies high-quality documents from the initial search results and selects the top N documents. These selected documents then undergo a fine sort, which involves more detailed scoring to determine the final ranking. The rough sort significantly impacts performance, while the fine sort primarily determines search quality. Therefore, a rough sort should be simple and efficient, using only the key factors from the fine sort. You configure both rough sort and fine sort using sort expressions.

Functions

`static_bm25`: Static text relevance

For more information, see static_bm25.
Parameters: None.
Return value: A float in the range of [0, 1].
Use case: To use the static text relevance score in a rough sort, set the expression to static_bm25().
Usage notes: The static_bm25() function is used by default in the default rough sort expression.

Note

When the static_bm25() score can exceed 1:

If you configure query analysis, such as for synonyms, the score may be higher than 1. For example, a default query is query=index:'apple'. After you configure synonyms, the query is expanded to query=index:'apple' OR index:'iphone'. If a document contains both "apple" and "iphone", the static_bm25() scores are added up. As a result, the final rough sort score can exceed 1.

`exact_match_boost`: Maximum boost weight

For more information, see exact_match_boost.
Parameters: None.
Return value: An integer in the range of [0, 99].
Use case: To sort results based on the boost weight specified for each matching term in a query. For example, consider the query query=default:'Open Search'^60 OR default:'opensearch'^50. If document A contains "Open Search" and document B contains "opensearch", document A is ranked higher than document B. To achieve this, set the rough sort expression to exact_match_boost().
Usage notes:
- This function requires the target field to be indexed.
- For query terms without a specified boost value, the default boost value is 99.
- For exclusive applications, when you configure the exact_match_boost function in a rough sort, you can use one of the following parameter options: '', 'sum', or 'max'. On the policy editing page, for the exact_match_boost() scoring feature, select an aggregation method from the search field drop-down list: max or sum. Then, set the corresponding weight.

`timeliness`: Timeliness score

Syntax: timeliness(pubtime). For more information, see timeliness.
Parameters: pubtime is the field to evaluate. The field must be of the int type, and the value must be a UNIX timestamp in seconds.
Return value: A float in the range of [0, 1]. A larger value indicates a more recent document. If the timestamp is in the future, the function returns 0.
Use cases: Calculate the timeliness score of the create_timestamp field in a rough sort. Set the expression to timeliness(create_timestamp).
Usage notes: You must configure the pubtime field as an attribute field.

`timeliness_ms`: Timeliness score

Syntax: timeliness_ms(pubtime). For more information, see timeliness_ms.
Parameters: pubtime is the field to evaluate. The field must be of the int type, and the value must be a UNIX timestamp in milliseconds.
Return value: A float in the range of [0, 1]. A larger value indicates a more recent document. If the timestamp is in the future, the function returns 0.
Use cases: Calculate the timeliness score of the create_timestamp field in a rough sort. Set the expression to timeliness_ms(create_timestamp).
Usage notes: You must configure the pubtime field as an attribute field.

`normalize`: Normalization function

During relevance calculation, a document's quality is measured across different dimensions. The scores from these dimensions may have vastly different ranges. For example, page clicks can be in the millions, while a static text relevance score is between 0 and 1. These scores are not directly comparable. To use these different scores in a single formula, you must normalize them to a common range. The normalize function simplifies this process. It supports three normalization methods: linear, logarithmic, and arctangent. The function automatically selects a method based on the parameters you provide. If you specify only the value parameter, it uses the arctangent method. If you specify value and max, it uses the logarithmic method. If you specify value, max, and min, it uses the linear method.
Syntax: normalize(value, max, min). For more information, see normalize.
Parameters: value is the value to normalize, which must be a double. The value can be from a document field or another expression. max is the optional maximum input value, which must be a double. min is the optional minimum input value, which must be a double.
Return value: A double value in the range of [0, 1].
Use cases:
- Scenario 1: To normalize the price field without knowing its value range, use the following expression: normalize(price).
- Scenario 2: To normalize the price field when you only know its maximum value is 100, use the following expression: normalize(price, 100).
- Scenario 3: To normalize the price field when you know its maximum value is 100 and its minimum value is 1, use the following expression: normalize(price, 100, 1).
- Scenario 4: To normalize the result of a distance function to the [0, 1] range, use the following expression: normalize(distance(longitude_in_doc, latitude_in_doc, longitude_in_query, latitude_in_query)).
Usage notes:
- Fields used as function parameters must be configured as attribute fields.
- When using the arctangent method for normalization, if the value is less than 0, the normalized value is 0.
- When using the logarithmic method for normalization, the max value must be greater than 1.
- When using the linear method for normalization, the max value must be greater than the min value.

Overview

Functions

static_bm25: Static text relevance

exact_match_boost: Maximum boost weight

timeliness: Timeliness score

timeliness_ms: Timeliness score

normalize: Normalization function

`static_bm25`: Static text relevance

`exact_match_boost`: Maximum boost weight

`timeliness`: Timeliness score

`timeliness_ms`: Timeliness score

`normalize`: Normalization function