All Products
Search
Document Center

Rough sort functions

Last Updated: Sep 09, 2021

Rough sort is the process of selecting the top N high-quality documents from all documents that are retrieved. Then, the top N high-quality documents are scored and sorted in the fine sort process. This way, users can obtain the documents that best match their requirements. Rough sort affects the search performance, whereas fine sort affects the ultimate sort results. Therefore, simple but efficient rough sort is preferred so that documents are roughly sorted based only on the key factors used for fine sort. Documents are roughly and finely sorted by using sort expressions. This topic describes the feature functions used for rough sort.

Feature functions

static_bm25: returns the static text relevance that indicates the matching degree between the query and document

  1. Syntax: static_bm25()

  2. Parameters: None

  3. Return value: The return value is of the FLOAT type. Valid values: [0,1].

  4. Scenario: You can use the static_bm25() function in a rough sort expression to calculate text scores.

  5. Usage notes:

    • By default, static_bm25() takes effect if you use the default rough sort expression.

exact_match_boost: calculates the maximum weight of a specified term in a query

  1. Syntax: exact_match_boost()

  2. Parameters: None

  3. Return value: The return value is of the INT type. Valid values: [0,99].

  4. Scenario: The query clause is query=default:'Open' ^ 60 OR default:'Search' ^ 50. You want to sort results based on the boost weight that is specified in the query clause for a matching term. For example, Document A contains the term Open, and Document B contains the term Search. In this case, Document A ranks higher than Document B. The rough sort expression is exact_match_boost().

  5. Usage notes:

    • The fields that you reference in the parameters of this function must be configured as index fields.

    • The default boost weight of the term for which no boost weight is specified in the query clause is 99.

    • If the exact_match_boost function is used in the rough sort expression for an exclusive application, the function parameters can be set to '', 'sum', and 'max'.

    1

timeliness: returns the timeliness score that indicates how new the document is

  1. Syntax: timeliness(pubtime)

  2. pubtime: the field whose timeliness is to be evaluated. The field must be of the INT type. Unit: seconds.

  3. Return value: The return value is of the FLOAT type. Valid values: [0,1]. A greater value indicates better timeliness. If the value of the field is later than the current time, 0 is returned.

  4. Scenario: You can use the timeliness(create_timestamp) function in a rough sort expression to calculate the timeliness score of the create_timestamp field.

  5. Usage notes:

    • The pubtime field must be configured as an attribute field.

timeliness_ms: returns the timeliness score that indicates how new the document is

  1. Syntax: timeliness_ms(pubtime)

  2. pubtime: the field whose timeliness is to be evaluated. The field must be of the INT type. Unit: milliseconds.

  3. Return value: The return value is of the FLOAT type. Valid values: [0,1]. A greater value indicates better timeliness. If the value of the field is later than the current time, 0 is returned.

  4. Scenario: You can use the timeliness_ms(create_timestamp) function in a rough sort expression to calculate the timeliness score of the create_timestamp field.

  5. Usage notes:

    • The pubtime field must be configured as an attribute field.

normalize: normalizes a score to a value that ranges [0,1]

  1. Scenario overview: The relevance of a document is calculated from different dimensions. The scores that are calculated from different dimensions may in different value ranges. For example, a web page can have millions of clicks, whereas the text relevance score of the web page is a value in [0,1]. You cannot compare such values in different value ranges. The normalize function can normalize the scores in different value ranges to scores in the same value range. This way, you can use the normalized scores for further calculation. The normalize function supports three normalization methods: linear normalization, log normalization, and arctangent normalization. The chosen normalization method varies based on the input parameters. If only the value parameter is set, the normalize function uses the arctangent function for normalization. If both the value and max parameters are set, the normalize function uses the logarithmic function for normalization. If all the value, max, and min parameters are set, the normalize function uses the linear function for normalization.

  2. Syntax: normalize(value, max, min)

  3. value: the field in a document or the expression for which you want to normalize the field value or the return value. The field value or return value must be of the DOUBLE type.

    max: the maximum value of the value range after normalization. This parameter is optional. The maximum value must be of the DOUBLE type.

    min: the minimum value of the value range after normalization. This parameter is optional. The minimum value must be of the DOUBLE type.

  4. Return value: The return value is of the DOUBLE type. Valid values: [0,1].

  5. Scenario 1: You want to normalize the value of the price field but do not know the value range of the price field. In this case, you can use the normalize function in the following format: normalize(price).

    Scenario 2: You want to normalize the value of the price field and know only the maximum value in the value range of the price field. In this case, you can use the normalize function in the following format: normalize(price, 100).

    Scenario 3: You want to normalize the value of the price field and the maximum value and minimum value are 100 and 1. In this case, you can use the normalize function in the following format: normalize(price, 100, 1).

    Scenario 4: You want to normalize the return value of the distance function to a value in [0,1]. In this case, you can use the normalize function in the following format: normalize(distance(longitude_in_doc, latitude_in_doc, longtitude_in_query, latitude_in_query)).

  6. Usage notes:

    • The fields that you reference in the parameters of this function must be configured as attribute fields.

    • If the arctangent function is used for normalization and the field value or the return value of the specified expression is smaller than 0, the return value of the normalize function is 0.

    • If the logarithmic function is used for normalization, the value of the max parameter must be greater than 1.

    • If the linear function is used for normalization, the value of the max parameter must be greater than that of the min parameter.

category_score: the category prediction function that returns the matching score between the specified category field in the parameters and the category obtained by the category prediction query

  1. Syntax:

    category_score(cate_id)

  2. Parameters:

    cate_id: the field that is used as a category ID for model training. This field must be of the INT type.

  3. Return value: The return value is of the INT type. Valid values: [0,2].

  4. Scenario: You can use the category_score(cate_id) function in a rough sort expression to predict categories.

  5. Usage notes: