Use an evaluation task to evaluate the effects of conversational search - OpenSearch

This topic describes how to create an evaluation task to evaluate the effects of conversational search. The conversational search process to be evaluated consists of the following three steps: (1) A user asks a question. (2) The system retrieves relevant content. (3) A large language model (LLM) generates an answer.

Usage notes

You are charged for effect evaluation based on the computing resources consumed during the evaluation.

Procedure

Log on to the OpenSearch console.
In the top navigation bar, select the region in which your instance resides. In the upper-left corner, select OpenSearch LLM-Based Conversational Search Edition.
On the Instance Management page, find the instance that you want to manage and click Manage in the Actions column. On the details page of the instance, click Effect Comparison in the left-side pane.

On the Evaluation Task tab, click Create Evaluation Task. On the Create Evaluation Task page, enter a task name, select an evaluation dataset, and then click Configure Parameters. In the Configure Parameters panel, configure the parameters that are described in the following tables.

Parameter	Description
Select Model	The model used for conversational search. For more information about available models, see Model management. Note An available model is the one that can be used to test the effects of conversational search.
Prompt	The prompt used for conversational search. You must configure a prompt template in advance. For more information, see Manage prompts.

Prompt parameters
Parameter	Type	Required	Valid value	Default value	Description
attitude	String	No	-	normal	The tone of the conversation. Default value: normal. Valid values: normal polite patience
rule	String	No	-	simple	The level of detail in the conversation. Default value: detailed. Valid values: detailed stepbystep
noanswer	String	No	-	sorry	The information that is returned if the system fails to find an answer to the question. Default value: sorry. Valid values: sorry uncertain
language	String	No	-	Chinese	The language of the answer. Default value: Chinese. Valid values: Chinese English Thai Korean
role	Boolean	No	-	true	Specifies whether to enable a custom role to answer the question.
role_name	String	No	-	AI Assistant	The custom role. Example: AI Assistant.
out_format	String	No	-	text	The format of the answer. Default value: text. Valid values: text table list markdown

Document retrieval parameters
Parameter	Type	Required	Valid value	Default value	Description
filter	String	No	-	-	The field that is used to filter documents. Example: filter = field = value.
top_n	INT	No	(0, 50]	5	The number of documents to be retrieved.
sf	Float	No	[0,+∞)	1.3	The threshold for determining the vector similarity of the documents to be retrieved. A greater value indicates a smaller vector similarity.
dense_weight	Float		(0,1)	0.7	The weight of the dense vector. This parameter is available if you select a sparse vector model. The weight of the sparse vector is calculated in the following way: 1 - Value of the dense_weight parameter.
formula	String	No	-	Vector similarity	The formula based on which the retrieved documents are sorted.
operator	String	No	-	AND	The operator between text tokens during text retrieval.

Reference image parameters
Parameter	Type	Required	Valid value	Default value	Description
sf	Float	No	[0,+∞)	1	The threshold for determining the vector similarity of reference images. For sparse vector models, a greater value indicates a greater vector similarity. For dense vector models, a greater value indicates a smaller vector similarity.
dense_weight	Float	No	(0,1)	0.7	The weight of the dense vector. This parameter is available if you select a sparse vector model. The weight of the sparse vector is calculated in the following way: 1 - Value of the dense_weight parameter.

Query understanding parameters
Parameter	Type	Required	Valid value	Default value	Description
query_extend	Boolean	No	-	false	Specifies whether to extend queries. You can enbale this feature to improve the retrieval performance.
query_exten_num	INT	No	(0,+∞)	5	The number of queries to be extended.

Manual intervention parameters
Parameter	Type	Required	Valid value	Default value	Description
sf	Float	No	[0,2]	0.3	The threshold for manual intervention. Default value: 0.3. A greater value indicates a match of intervention entries in an easier way.

Other parameters
Parameter	Type	Required	Valid value	Default value	Description
return_hits	Boolean	No	-	false	Specifies whether to return the document retrieval results.
csi_level	String	No	-	strict	The configurations for content moderation. Valid values: none: does not moderate the content. loose: moderates the results and blocks the results if restricted content is detected. In this case, no results are returned. strict: moderates the results and blocks the results if restricted or suspicious content is detected. In this case, no results are returned.
history_max	INT	No	(0,20]	20	The maximum number of rounds of conversations based on which the system returns results. You can specify up to 20 rounds.
link	Boolean	No	-	false	Specifies whether to return the source of the retrieved document.

After you configure the preceding parameters, click OK. The system generates an overall score after the evaluation is complete.
Click Evaluation Report to view the evaluation results of each Q&A pair. If the evaluation results are inaccurate, click Manual Evaluation to manually revise the evaluation results.