This topic describes the MultiSearch unified question and answer (Q&A) API. You can use this API to retrieve content from a knowledge base and perform text-based and table-based Q&A queries.
Prerequisites
Obtain an API key. You must use this key for identity authentication when you call operations of OpenSearch LLM-Based Conversational Search. For more information, see Manage API keys.
Obtain a service endpoint. You must provide a service endpoint when you call the operations of OpenSearch LLM-Based Conversational Search. For more information, see Obtain service endpoints.
Precautions
If a custom table exists, a table-based Q&A query is performed first. Otherwise, a text-based Q&A query is performed.
If only one custom table exists, the Q&A query is performed on that table. If multiple custom tables exist, the system automatically selects the table most relevant to the user's question and performs the Q&A query on that table.
If the table-based Q&A query returns no results, a text-based Q&A query is performed.
If you use this API and multiple custom tables exist, ensure that the tables are highly distinct from each other. Otherwise, incorrect answers may be returned.
API information
Request method | Protocol | Request data format |
POST | HTTP | JSON |
Request URL
{host}/v3/openapi/apps/[app_group_identity]/actions/multi-search{host}: The service endpoint. You can call the API over the Internet or through a VPC. For more information, see Obtain service endpoints.{app_group_identity}: The application name. To obtain the application name, log on to the OpenSearch LLM-Based Conversational Search console and view the application name for the instance on the Instance Management page.
Request parameters
Header parameters
Parameter | Type | Required | Description | Example |
Content-Type | string | Yes | The data format of the request. Set the value to "application/json". | application/json |
Authorization | string | Yes | The API key for request authentication. The value must start with Bearer. | Bearer OS-d1**2a |
accept | String | No | For Server-Sent Events (SSE) requests, set the value to "text/event-stream". | text/event-stream |
Body parameters
Parameter | Type | Required | Description | Example |
question | map | Yes | The input question. | { "text":"user question", "type": "TEXT", "session" : "" } |
question.text | string | Yes | The text of the input question. | user question |
question.session | string | No | The session ID for a multi-round conversation. This ID identifies the context of the conversation. Valid values:
| 1725530408586 |
question.type | string | No | The type of the input question. Set the value to | TEXT |
options | map | No | The additional request parameters that control retrieval, models, prompts, and more. | |
options.chat | map | No | The parameters related to large language model (LLM) access. | |
options.chat.disable | boolean | No | Specifies whether to disable LLM access.
| false |
options.chat.stream | boolean | No | Specifies whether to return results in a stream.
| true |
options.chat.model | string | No | The LLM to use. Valid values: Singapore region
| opensearch-llama2-13b |
options.chat.enable_deep_search | boolean | No | Specifies whether to enable deep search.
| false |
options.chat.model_generation | integer | No | When using a product-customized model, set the corresponding model version. By default, the oldest version is used. | 20 |
options.chat.prompt_template | string | No | The name of the custom prompt template. If this parameter is empty, the system's built-in prompt template is used. | user_defined_prompt_name |
options.chat.prompt_config | object | No | The key-value pairs configured in the custom prompt. The parameter format is: | |
options.chat.prompt_config.attitude | string | No | A parameter in the built-in template that controls the tone of the conversation. The default value is normal.
| normal |
options.chat.prompt_config.rule | string | No | The level of detail in the conversation. The default value is detailed.
| detailed |
options.chat.prompt_config.noanswer | string | No | The response when an answer cannot be found. The default value is sorry.
| sorry |
options.chat.prompt_config.language | string | No | The language used for the answer. The default value is Chinese.
| Chinese |
options.chat.prompt_config.role | boolean | No | Specifies whether to enable a custom role for the answer. If enabled, a custom role for the answer is used. | false |
options.chat.prompt_config.role_name | string | No | The custom role for the answer. Example: AI Assistant. | AI Assistant |
options.chat.prompt_config.out_format | string | No | The format of the output content. The default value is text.
| text |
options.chat.generate_config.repetition_penalty | float | No | Controls the repetition of consecutive sequences in the model's output. Increasing the repetition_penalty reduces the repetition in the generated text. A value of 1.0 means no penalty. There is no strict value range. | 1.01 |
options.chat.generate_config.top_k | integer | No | The size of the candidate set for sampling during generation. For example, a value of 50 means that only the top 50 tokens with the highest scores in a single generation are used to form the random sampling candidate set. A larger value increases the randomness of the generated text. A smaller value increases the determinism. The default value is 0, which disables the top_k strategy. In this case, only the top_p strategy takes effect. | 50 |
options.chat.generate_config.top_p | float | No | The probability threshold for nucleus sampling during generation. For example, a value of 0.8 means that only the smallest set of most likely tokens with a cumulative probability of 0.8 or higher is retained as the candidate set. The value must be in the range of (0, 1.0). A larger value increases the randomness of the generated text. A smaller value increases the determinism. | 0.5 |
options.chat.generate_config.temperature | float | No | Controls the degree of randomness and diversity. Specifically, the temperature value controls the degree of smoothing applied to the probability distribution of each candidate word during text generation. A higher temperature value flattens the probability distribution, allowing more low-probability words to be selected and making the output more diverse. A lower temperature value sharpens the probability distribution, making high-probability words more likely to be selected and making the output more deterministic. The value must be in the range of [0, 2). We recommend that you do not set this parameter to 0. python version >=1.10.1 java version >= 2.5.1 | 0.7 |
options.chat.history_max | integer | No | The maximum number of historical rounds for a multi-round conversation. The maximum value is 20. The default value is 1. | 20 |
options.chat.link | boolean | No | Specifies whether to return links. This controls whether the content generated by the model indicates the source of the reference. Valid values:
The following sample response is returned if the content includes the source: The number enclosed by | false |
options.chat.rich_text_strategy | string | No | The post-processing method for the output of a rich text LLM. If this parameter is not configured or is empty, the rich text feature is disabled and the default behavior is used.
| inside_response |
options.chat.agent | map | No | The options for configuring retrieval-augmented generation (RAG) tool capabilities. When enabled, the model decides whether to execute the corresponding tool based on the existing content. The following LLMs support this feature:
| |
options.chat.agent.think_process | boolean | No | Specifies whether to return the thinking process. | true |
options.chat.agent.max_think_round | integer | No | The number of thinking rounds. The maximum value is 20. | 10 |
options.chat.agent.language | string | No | The language for the thinking process and the answer. AUTO: Automatically determines whether to use Chinese or English based on the user query. CN: Chinese. EN: English. | AUTO |
options.chat.agent.tools | list of string | No | The names of the RAG tools to use. The following tool is available:
| ["knowledge_search"] |
options.retrieve | map | No | The additional request parameters that control retrieval, models, prompts, and more. | |
options.retrieve.web_search.enable | boolean | No | Specifies whether to enable web search.
| false |
doc | map | No | The parameters that control retrieval. | |
options.retrieve.doc.disable | boolean | No | Specifies whether to disable knowledge base retrieval.
| false |
options.retrieve.doc.filter | string | No | Filters the data retrieved from the knowledge base. By default, this parameter is empty. For more information about how to use the filter parameter, see filter parameter. Supported fields:
Example format: | category=\"value1\" |
options.retrieve.doc.sf | float | No | The threshold for the vector score in vector retrieval.
| 0.35 |
options.retrieve.doc.top_n | integer | No | The number of documents to retrieve. The default value is 5. The value must be in the range of (0, 50]. | 5 |
options.retrieve.doc.formula | string | No | The formula for sorting documents during retrieval. Note For more information about the syntax, see Fine sort functions. The algorithm relevance and geographical location relevance features are not supported. | -timestamp: Sorts documents in descending order by the timestamp field. |
options.retrieve.doc.rerank_size | integer | No | The number of documents to rerank when the rerank feature is enabled. The default value is 30. The value must be in the range of (0, 100]. | 30 |
options.retrieve.doc.operator | string | No | The relationship between the terms after the question.text is tokenized for knowledge base retrieval. This parameter takes effect only when sparse vectors are not enabled.
| AND |
options.retrieve.doc.dense_weight | float | No | The weight of the dense vector during document retrieval when sparse vectors are enabled. The value must be in the range of (0.0, 1.0). The default value is 0.7. | 0.7 |
options.retrieve.entry | map | No | The parameters that control the retrieval of results from manually intervened data. | |
options.retrieve.entry.disable | boolean | No | Specifies whether to disable the retrieval of manually intervened data.
| false |
options.retrieve.entry.sf | float | No | The threshold for the vector score for retrieving manually intervened data. The value must be in the range of [0, 2.0]. The default value is 0.3. A smaller value indicates that the results are more relevant, but fewer results are returned. A larger value may retrieve less relevant results. | 0.3 |
options.retrieve.image | map | No | The parameters that control the retrieval of image results from the knowledge base. | |
options.retrieve.image.disable | boolean | No | Specifies whether to disable the retrieval of image data. The default value is false.
| false |
options.retrieve.image.sf | float | No | The threshold for the vector score in vector retrieval.
| 1.0 |
options.retrieve.image.dense_weight | float | No | The weight of the dense vector during image retrieval when sparse vectors are enabled. The value must be in the range of (0.0, 1.0). The default value is 0.7. | 0.7 |
options.retrieve.qp | map | No | The options for query rewriting. | |
options.retrieve.qp.query_extend | boolean | No | Specifies whether to expand the user query. The expanded queries are used to retrieve document segments in the engine. The default value is false.
| false |
options.retrieve.qp.query_extend_num | integer | No | The maximum number of queries to expand when similar query expansion is enabled. The default value is 5. | 5 |
options.retrieve.rerank | map | No | The options for reranking during document retrieval. | |
options.retrieve.rerank.enable | boolean | No | Specifies whether to use a model to rerank the retrieved results based on relevance. Valid values:
| true |
options.retrieve.rerank.model | string | No | The name of the LLM used for reranking.
| ops-bge-reranker-larger |
options.retrieve.return_hits | boolean | No | Specifies whether to return the document retrieval results in the response, which is the search_hits parameter. | false |
Sample request body
{
"question" : {
"text" : "user question",
"session" : "The session of the conversation. If you specify this parameter, the multi-round conversation feature is enabled.",
"type" : "TEXT"
},
"options": {
"chat": {
"disable" : false, # Specifies whether to disable LLM access and directly return document retrieval results. The default value is false, which indicates that LLM generation is enabled.
"stream" : false, # Specifies whether to return the results in a stream. The default value is false.
"model" : "Qwen", # The LLM to use.
"prompt_template" : "user_defined_prompt_name", # The name of the custom prompt.
"prompt_config" : { # The custom prompt configuration. This parameter is optional.
"key" : "value" # Replace the key and value with a specific key-value pair.
},
"generate_config" : {
"repetition_penalty": 1.01,
"top_k": 50,
"top_p": 0.5,
"temperature": 0.7
},
"history_max": 20, # The maximum number of historical rounds for a multi-round conversation.
"link": false, # Return links.
"agent":{
"tools":["knowledge_search"]
}
},
"retrieve": {
"doc": {
"disable": false, # Specifies whether to disable document retrieval. The default value is false.
"filter": "category=\"type\"", # Filters documents by the category field during retrieval. By default, this parameter is empty.
"sf": 0.35, # The vector retrieval threshold.
"top_n": 5, # The number of documents to retrieve. The default value is 5. The value must be in the range of (0, 50].
"formula" : "", # The default is vector similarity.
"rerank_size" : 5, # The number of documents for fine sorting. You do not need to set this parameter. The system determines the value.
"operator":"OR" # Specifies that the relationship between text tokens is OR during text retrieval. The default is AND.
},
"web_search":{
"enable": false # Specifies whether to enable web search. The default is false.
},
"entry": {
"disable": false, # Specifies whether to disable the retrieval of manually intervened data. The default value is false.
"sf": 0.3 # The vector relevance for the retrieval of intervened data. The default value is 0.3.
},
"image": {
"disable": false, # Specifies whether to disable image data retrieval. The default value is false.
"sf": 1.0 # The vector relevance for image data retrieval. The default value is 1.0.
},
"qp": {
"query_extend": false, # Specifies whether to perform query expansion on the user query.
"query_extend_num": 5 # The number of expanded queries. The default value is 5.
},
"rerank" : {
"enable": true, # Specifies whether to use an LLM to rerank the retrieved results. The default value is true.
"model":"model_name" # Replace with a specific model name.
},
"return_hits": false # Specifies whether to return the document retrieval results in the response, which is the search_hits parameter.
}
}
}Response parameters
Parameter | Type | Description |
request_id | string | The request ID. |
status | string | The processing status of the request.
|
latency | float | The time it took the server to process the request. Unit: milliseconds. |
id | integer | The primary key ID. |
title | string | The title of the document. |
category | string | The category name. |
url | string | The document link. |
answer | string | The Q&A result. |
type | string | The type of the returned result. |
scores | array | The document content score. |
event | string | The thinking event. A round of the thinking process consists of THINK, ACTION, and ANSWER. The THINK event is not always returned. THINK indicates the thinking process. ACTION indicates the action performed. ANSWER indicates the conclusion of the current thinking round. SUMMARY is the final answer. Only one SUMMARY event of the text type is returned. |
event_status | string | Indicates whether the result is complete. PROCESSING FINISHED |
code | string | The error code. This parameter is returned if an error occurs. |
message | string | The error message. This parameter is returned if an error occurs. |
Sample response body
{
"request_id": "6859E98D-D885-4AEF-B61C-9683A0184744",
"status": "OK",
"latency": 6684.410397,
"result" : {
"data" : [
{
"answer" : "answer text",
"type" : "TEXT",
"reference" : [
{"url" : "http://....","title":"doc title"}
]
},
{
"reference": [
{"id": "16","title": "Test title","category": "Test category","url": "Test URL"}
],
"answer": "https://ecmb.bdimg.com/tam-ogel/-xxxx.jpg",
"type": "IMAGE"
}
],
"search_hits" : [ // This parameter is returned only if you set options.retrieve.return_hits to true in the request.
{
"fields" : {
"content" : "...."
"key1" : "value1"
},
"scores" : ["10000.1234"],
"type" : "doc"
},
{
"fields"{
"answer" : "...",
"key1" : "value1"
},
"scores" : ["10000.1234"],
"type" : "entry"
}
]
}
"errors" : [
{
"code" : "The error code. This parameter is returned if an error occurs.",
"message" : "The error message. This parameter is returned if an error occurs."
}
]
}