All Products
Search
Document Center

OpenSearch:Manage evaluation tasks

Last Updated:Sep 04, 2024

This topic describes how to create an evaluation task to evaluate the effects of conversational search. The conversational search process to be evaluated consists of the following three steps: (1) A user asks a question. (2) The system retrieves relevant content. (3) A large language model (LLM) generates an answer.

Usage notes

You are charged for effect evaluation based on the computing resources consumed during the evaluation.

Procedure

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select the region in which your instance resides. In the upper-left corner, select OpenSearch LLM-Based Conversational Search Edition.

  3. On the Instance Management page, find the instance that you want to manage and click Manage in the Actions column. On the details page of the instance, click Effect Comparison in the left-side pane.

  4. On the Evaluation Task tab, click Create Evaluation Task. On the Create Evaluation Task page, enter a task name, select an evaluation dataset, and then click Configure Parameters. In the Configure Parameters panel, configure the parameters that are described in the following tables.

    Parameter

    Description

    Select Model

    The model used for conversational search. For more information about available models, see Model management.

    Note

    An available model is the one that can be used to test the effects of conversational search.

    Prompt

    The prompt used for conversational search. You must configure a prompt template in advance. For more information, see Manage prompts.

    Prompt parameters

    Parameter

    Type

    Required

    Valid value

    Default value

    Description

    attitude

    String

    No

    -

    normal

    • The tone of the conversation. Default value: normal. Valid values:

    • normal

    • polite

    • patience

    rule

    String

    No

    -

    simple

    The level of detail in the conversation. Default value: detailed. Valid values:

    • detailed

    • stepbystep

    noanswer

    String

    No

    -

    sorry

    The information that is returned if the system fails to find an answer to the question. Default value: sorry. Valid values:

    • sorry

    • uncertain

    language

    String

    No

    -

    Chinese

    The language of the answer. Default value: Chinese. Valid values:

    • Chinese

    • English

    • Thai

    • Korean

    role

    Boolean

    No

    -

    true

    Specifies whether to enable a custom role to answer the question.

    role_name

    String

    No

    -

    AI Assistant

    The custom role. Example: AI Assistant.

    out_format

    String

    No

    -

    text

    The format of the answer. Default value: text. Valid values:

    • text

    • table

    • list

    • markdown

    Document retrieval parameters

    Parameter

    Type

    Required

    Valid value

    Default value

    Description

    filter

    String

    No

    -

    -

    The field that is used to filter documents. Example: filter = field = value.

    top_n

    INT

    No

    (0, 50]

    5

    The number of documents to be retrieved.

    sf

    Float

    No

    [0,+∞)

    1.3

    The threshold for determining the vector similarity of the documents to be retrieved. A greater value indicates a smaller vector similarity.

    dense_weight

    Float

    (0,1)

    0.7

    The weight of the dense vector. This parameter is available if you select a sparse vector model. The weight of the sparse vector is calculated in the following way: 1 - Value of the dense_weight parameter.

    formula

    String

    No

    -

    Vector similarity

    The formula based on which the retrieved documents are sorted.

    operator

    String

    No

    -

    AND

    The operator between text tokens during text retrieval.

    Reference image parameters

    Parameter

    Type

    Required

    Valid value

    Default value

    Description

    sf

    Float

    No

    [0,+∞)

    1

    The threshold for determining the vector similarity of reference images. For sparse vector models, a greater value indicates a greater vector similarity. For dense vector models, a greater value indicates a smaller vector similarity.

    dense_weight

    Float

    No

    (0,1)

    0.7

    The weight of the dense vector. This parameter is available if you select a sparse vector model. The weight of the sparse vector is calculated in the following way: 1 - Value of the dense_weight parameter.

    Query understanding parameters

    Parameter

    Type

    Required

    Valid value

    Default value

    Description

    query_extend

    Boolean

    No

    -

    false

    Specifies whether to extend queries. You can enbale this feature to improve the retrieval performance.

    query_exten_num

    INT

    No

    (0,+∞)

    5

    The number of queries to be extended.

    Manual intervention parameters

    Parameter

    Type

    Required

    Valid value

    Default value

    Description

    sf

    Float

    No

    [0,2]

    0.3

    The threshold for manual intervention. Default value: 0.3. A greater value indicates a match of intervention entries in an easier way.

    Other parameters

    Parameter

    Type

    Required

    Valid value

    Default value

    Description

    return_hits

    Boolean

    No

    -

    false

    Specifies whether to return the document retrieval results.

    csi_level

    String

    No

    -

    strict

    The configurations for content moderation. Valid values:

    • none: does not moderate the content.

    • loose: moderates the results and blocks the results if restricted content is detected. In this case, no results are returned.

    • strict: moderates the results and blocks the results if restricted or suspicious content is detected. In this case, no results are returned.

    history_max

    INT

    No

    (0,20]

    20

    The maximum number of rounds of conversations based on which the system returns results. You can specify up to 20 rounds.

    link

    Boolean

    No

    -

    false

    Specifies whether to return the source of the retrieved document.

  5. After you configure the preceding parameters, click OK. The system generates an overall score after the evaluation is complete.

    Click Evaluation Report to view the evaluation results of each Q&A pair. If the evaluation results are inaccurate, click Manual Evaluation to manually revise the evaluation results.