All Products
Search
Document Center

OpenSearch:Manage evaluation tasks

Last Updated:Apr 01, 2026

Use evaluation tasks to measure the end-to-end quality of conversational search and compare different configurations — model, prompt, and retrieval settings — to identify which setup produces the best results for your use case. The evaluated pipeline covers three stages: a user submits a question, the system retrieves relevant documents, and a large language model (LLM) generates an answer.

Evaluation tasks are billed based on the computing resources consumed during the evaluation.

Prerequisites

Before you begin, make sure you have:

  • An OpenSearch LLM-Based Conversational Search Edition instance

  • A prompt template configured. For details, see Manage prompts

  • An evaluation dataset ready to use

Create an evaluation task

  1. Log on to the OpenSearch console.

  2. In the top navigation bar, select the region where your instance resides. In the upper-left corner, select OpenSearch LLM-Based Conversational Search Edition.

  3. On the Instance Management page, find your instance and click Manage in the Actions column. On the instance details page, click Effect Comparison in the left-side pane.

  4. On the Evaluation Task tab, click Create Evaluation Task.

  5. Enter a task name, select an evaluation dataset, and click Configure Parameters.

  6. In the Configure Parameters panel, set the parameters described in the following sections, then click OK.

After the evaluation completes, the system generates an overall score. Click Evaluation Report to view per-Q&A-pair scores and details. If any result appears inaccurate, click Manual Evaluation to revise the score manually.

Parameter reference

Select model and prompt

ParameterDescription
Select ModelThe model used for conversational search. For available models, see Model management. An available model is one that can be used to test the effects of conversational search.
PromptThe prompt template used for conversational search. Configure a prompt template before creating the task. For details, see Manage prompts.

Prompt parameters

These parameters control how the LLM generates answers.

ParameterTypeRequiredDefaultValid valuesDescription
attitudeStringNonormalnormal, polite, patienceThe tone of the conversation.
ruleStringNosimpledetailed, stepbystepThe level of detail in the answer.
noanswerStringNosorrysorry, uncertainThe response returned when the system cannot find an answer.
languageStringNoChineseChinese, English, Thai, KoreanThe language of the generated answer.
roleBooleanNotrueSpecifies whether to use a custom role to answer questions.
role_nameStringNoAI AssistantThe name of the custom role. Example: AI Assistant.
out_formatStringNotexttext, table, list, markdownThe format of the generated answer.

Document retrieval parameters

These parameters control how the system retrieves documents from your data source.

ParameterTypeRequiredDefaultValid valuesDescription
filterStringNoThe field expression used to filter documents. Example: filter = field = value.
top_nINTNo5(0, 50]The number of documents to retrieve.
sfFloatNo1.3[0, +∞)The vector similarity threshold for retrieved documents. A higher value means lower similarity is required.
dense_weightFloatNo0.7(0, 1)The weight of the dense vector. Available when a sparse vector model is selected. The sparse vector weight equals 1 - dense_weight.
formulaStringNoVector similarityThe formula used to rank retrieved documents.
operatorStringNoANDThe operator applied between text tokens during text retrieval.

Reference image parameters

These parameters apply when your retrieval pipeline includes image data.

ParameterTypeRequiredDefaultValid valuesDescription
sfFloatNo1[0, +∞)The vector similarity threshold for reference images. For sparse vector models, a higher value means greater similarity. For dense vector models, a higher value means lower similarity.
dense_weightFloatNo0.7(0, 1)The weight of the dense vector. The sparse vector weight equals 1 - dense_weight.

Query understanding parameters

These parameters control how the system interprets and expands the user's query before retrieval.

ParameterTypeRequiredDefaultValid valuesDescription
query_extendBooleanNofalseSpecifies whether to enable query expansion. Enabling this can improve retrieval performance.
query_exten_numINTNo5(0, +∞)The number of expanded queries to generate.

Manual intervention parameters

ParameterTypeRequiredDefaultValid valuesDescription
sfFloatNo0.3[0, 2]The similarity threshold for matching manual intervention entries. A higher value makes it easier to trigger a match.

Other parameters

ParameterTypeRequiredDefaultValid valuesDescription
return_hitsBooleanNofalseSpecifies whether to include document retrieval results in the response.
csi_levelStringNostrictnone, loose, strictContent moderation level. none: no moderation. loose: blocks restricted content. strict: blocks restricted and suspicious content.
history_maxINTNo20(0, 20]The maximum number of conversation rounds the system uses to generate a response.
linkBooleanNofalseSpecifies whether to return the source of each retrieved document.

Related topics

  • Model management — review available models to use in your evaluation task

  • Manage prompts — configure prompt templates before creating an evaluation task