This topic describes how to create an evaluation task to evaluate the effects of conversational search. The conversational search process to be evaluated consists of the following three steps: (1) A user asks a question. (2) The system retrieves relevant content. (3) A large language model (LLM) generates an answer.
Usage notes
You are charged for effect evaluation based on the computing resources consumed during the evaluation.
Procedure
Log on to the OpenSearch console.
In the top navigation bar, select the region in which your instance resides. In the upper-left corner, select OpenSearch LLM-Based Conversational Search Edition.
On the Instance Management page, find the instance that you want to manage and click Manage in the Actions column. On the details page of the instance, click Effect Comparison in the left-side pane.
On the Evaluation Task tab, click Create Evaluation Task. On the Create Evaluation Task page, enter a task name, select an evaluation dataset, and then click Configure Parameters. In the Configure Parameters panel, configure the parameters that are described in the following tables.
Parameter
Description
Select Model
The model used for conversational search. For more information about available models, see Model management.
NoteAn available model is the one that can be used to test the effects of conversational search.
Prompt
The prompt used for conversational search. You must configure a prompt template in advance. For more information, see Manage prompts.
Prompt parameters
Parameter
Type
Required
Valid value
Default value
Description
attitude
String
No
-
normal
The tone of the conversation. Default value: normal. Valid values:
normal
polite
patience
rule
String
No
-
simple
The level of detail in the conversation. Default value: detailed. Valid values:
detailed
stepbystep
noanswer
String
No
-
sorry
The information that is returned if the system fails to find an answer to the question. Default value: sorry. Valid values:
sorry
uncertain
language
String
No
-
Chinese
The language of the answer. Default value: Chinese. Valid values:
Chinese
English
Thai
Korean
role
Boolean
No
-
true
Specifies whether to enable a custom role to answer the question.
role_name
String
No
-
AI Assistant
The custom role. Example: AI Assistant.
out_format
String
No
-
text
The format of the answer. Default value: text. Valid values:
text
table
list
markdown
Document retrieval parameters
Parameter
Type
Required
Valid value
Default value
Description
filter
String
No
-
-
The field that is used to filter documents. Example: filter = field = value.
top_n
INT
No
(0, 50]
5
The number of documents to be retrieved.
sf
Float
No
[0,+∞)
1.3
The threshold for determining the vector similarity of the documents to be retrieved. A greater value indicates a smaller vector similarity.
dense_weight
Float
(0,1)
0.7
The weight of the dense vector. This parameter is available if you select a sparse vector model. The weight of the sparse vector is calculated in the following way: 1 - Value of the dense_weight parameter.
formula
String
No
-
Vector similarity
The formula based on which the retrieved documents are sorted.
operator
String
No
-
AND
The operator between text tokens during text retrieval.
Reference image parameters
Parameter
Type
Required
Valid value
Default value
Description
sf
Float
No
[0,+∞)
1
The threshold for determining the vector similarity of reference images. For sparse vector models, a greater value indicates a greater vector similarity. For dense vector models, a greater value indicates a smaller vector similarity.
dense_weight
Float
No
(0,1)
0.7
The weight of the dense vector. This parameter is available if you select a sparse vector model. The weight of the sparse vector is calculated in the following way: 1 - Value of the dense_weight parameter.
Query understanding parameters
Parameter
Type
Required
Valid value
Default value
Description
query_extend
Boolean
No
-
false
Specifies whether to extend queries. You can enbale this feature to improve the retrieval performance.
query_exten_num
INT
No
(0,+∞)
5
The number of queries to be extended.
Manual intervention parameters
Parameter
Type
Required
Valid value
Default value
Description
sf
Float
No
[0,2]
0.3
The threshold for manual intervention. Default value: 0.3. A greater value indicates a match of intervention entries in an easier way.
Other parameters
Parameter
Type
Required
Valid value
Default value
Description
return_hits
Boolean
No
-
false
Specifies whether to return the document retrieval results.
csi_level
String
No
-
strict
The configurations for content moderation. Valid values:
none: does not moderate the content.
loose: moderates the results and blocks the results if restricted content is detected. In this case, no results are returned.
strict: moderates the results and blocks the results if restricted or suspicious content is detected. In this case, no results are returned.
history_max
INT
No
(0,20]
20
The maximum number of rounds of conversations based on which the system returns results. You can specify up to 20 rounds.
link
Boolean
No
-
false
Specifies whether to return the source of the retrieved document.
After you configure the preceding parameters, click OK. The system generates an overall score after the evaluation is complete.
Click Evaluation Report to view the evaluation results of each Q&A pair. If the evaluation results are inaccurate, click Manual Evaluation to manually revise the evaluation results.