Build an evaluation dataset with at least 50 question-answer pairs by using the provided sample template, and then run evaluation tasks to measure Q&A performance.
Procedure
-
Log on to the OpenSearch console.
-
Select the destination region and switch to OpenSearch LLM-Based Conversational Search Edition.
-
In the instance list, click Manage on the right side of the destination instance. In the left-side navigation pane, select Effect Comparison.
-
On the Evaluation Dataset tab, click Create Evaluation Dataset, enter the evaluation dataset name, and upload the evaluation dataset in Excel or JSON format according to the Data Sample.
NoteTo obtain accurate evaluation results, the evaluation dataset must contain at least 50 questions.

Download: Downloads the evaluation dataset.
Delete: Removes the evaluation dataset from the evaluation task.
Next step
Test datasets
1. University website admission dataset
This document library is sourced from the Renmin University of China admissions website and covers admission policies, department introductions, and related topics.
Source: https://arxiv.org/abs/2406.05654
Dataset: domainrag_xlsx_corpus.xlsx
Q&A set: basic_qa_anslen1.xlsx
2. Question-Answering Based on Insurance Clauses
This document library contains clauses from various insurance products.
Source: https://tianchi.aliyun.com/competition/entrance/532194/information
Dataset: tianchi_doc_with_title.json
Q&A set: dev_qa_sample_50_for_llm.xlsx
3. CRUD news dataset
This document library contains articles from Chinese news websites published after July 2023.
Source: https://arxiv.org/abs/2401.17043
Dataset (split into three parts due to console size requirements):
Q&A set: crud_1doc_qa_sample100_for_llm.xlsx