Calling the MultiSearch unified Q&A API - OpenSearch - Alibaba Cloud Documentation Center

The MultiSearch unified Q&A API retrieves content from a specified knowledge base. This API supports both text-based and table-based query methods, allowing you to perform intelligent Q&A on structured tables and unstructured documents.

Prerequisites

Obtain an API key. An API key is required for identity authentication when you call the OpenSearch LLM-Based Conversational Search service. For more information, see Manage API keys.
Obtain a service endpoint. You must provide a service endpoint to call operations of the OpenSearch LLM-Based Conversational Search service. For more information, see Obtain service endpoints.

Precautions

Note the following when you use the unified Q&A API:

If a custom table exists, a table-based Q&A query is performed first. Otherwise, a text-based Q&A query is performed.
If only one custom table exists, the Q&A query is performed on that table. If multiple custom tables exist, the system automatically selects the table most relevant to the user's question and performs the Q&A query on that table.
If the table-based Q&A query returns no results, a text-based Q&A query is performed.
If you use this API with multiple custom tables, ensure that the tables are highly distinct. Otherwise, incorrect answers may be returned.

API information

Request method	Protocol	Request data format
POST	HTTP	JSON

Request URL

{host}/v3/openapi/apps/[app_group_identity]/actions/multi-search

{host}: The service endpoint. You can call the API over the Internet or through a VPC. For more information, see Obtain service endpoints.
{app_group_identity}: The application name. To obtain the application name, log on to the OpenSearch LLM-Based Conversational Search console and find the application name of the instance on the Instance Management page.

Request parameters

Header parameters

Parameter	Type	Required	Description	Example
Content-Type	string	Yes	The data format of the request. Set the value to "application/json".	application/json
Authorization	string	Yes	The API key for request authentication. The value must start with Bearer.	Bearer OS-d1**2a
accept	String	No	For Server-Sent Events (SSE) requests, set the value to "text/event-stream".	text/event-stream

Body parameters

Parameter	Type	Required	Description	Example
question	map	Yes	The input question.	{ "text":"user question", "type": "TEXT", "session" : "" }
question.text	string	Yes	The text of the input question.	user question
question.session	string	No	The session ID for a multi-turn conversation. This ID identifies the context of the conversation. Valid values: If you do not set this parameter or leave it empty, multi-turn conversation is disabled. If you set this parameter to a non-empty value, multi-turn conversation is enabled. The system retains conversations with the same session ID as the context for the multi-turn conversation.	1725530408586
question.type	string	No	The type of the input question. Set the value to `TEXT`.	TEXT
options	map	No	The additional request parameters that control retrieval, models, prompts, and more.
options.chat	map	No	The parameters related to large language model (LLM) access.
options.chat.disable	boolean	No	Specifies whether to disable LLM access. false (default): Accesses the LLM to summarize and generate results. true: Does not access the LLM.	false
options.chat.stream	boolean	No	Specifies whether to return results in a stream. true (default): Returns results in a stream. false: Returns results in a non-streaming manner.	true
options.chat.model	string	No	The LLM to use. Valid values: Singapore region opensearch-llama2-13b opensearch-falcon-7b qwen-turbo qwen-plus qwen-max qwen2-72b-instruct	opensearch-llama2-13b
options.chat.enable_deep_search	boolean	No	Specifies whether to enable deep search. true: This requires multiple rounds of inference to synthesize data and return results. A single conversation consumes more time and compute resources. false	false
options.chat.model_generation	integer	No	When using a product-customized model, set the corresponding model version. By default, the oldest version is used.	20
options.chat.prompt_template	string	No	The name of the custom prompt template. If this parameter is empty, the system's built-in prompt template is used.	user_defined_prompt_name
options.chat.prompt_config	object	No	The key-value pairs configured in the custom prompt. The parameter format is: `{ "String_key": "value", "Integer_key" : 1 }`	`{ "attitude": "normal", "rule" : "detailed", "noanswer": "sorry", "language": "Chinese", "role": false, "role_name": "AI Assistant", }`
options.chat.prompt_config.attitude	string	No	A parameter in the built-in template that controls the tone of the conversation. The default value is normal. normal: Neutral. polite: Uses a kind and polite tone. patience: Uses a gentle and patient tone.	normal
options.chat.prompt_config.rule	string	No	The level of detail in the conversation. The default value is detailed. detailed: Detailed and professional. stepbystep: Detailed and step-by-step.	detailed
options.chat.prompt_config.noanswer	string	No	The response when an answer cannot be found. The default value is sorry. sorry: Sorry, I cannot answer this question based on the available information. uncertain: I don't know.	sorry
options.chat.prompt_config.language	string	No	The language used for the answer. The default value is Chinese. Chinese English Thai Korean	Chinese
options.chat.prompt_config.role	boolean	No	Specifies whether to enable a custom role for the answer.	false
options.chat.prompt_config.role_name	string	No	The name of the custom role. For example: AI Assistant.	AI Assistant
options.chat.prompt_config.out_format	string	No	The format of the output content. The default value is text. text table list markdown	text
options.chat.generate_config.repetition_penalty	float	No	Controls the repetition of consecutive sequences in the model's output. Increasing the repetition_penalty reduces the repetition in the generated text. A value of 1.0 means no penalty. There is no strict value range.	1.01
options.chat.generate_config.top_k	integer	No	The size of the candidate set for sampling during generation. For example, a value of 50 means that only the top 50 tokens with the highest scores in a single generation are used to form the random sampling candidate set. A larger value increases the randomness of the generated text. A smaller value increases the determinism. The default value is 0, which disables the top_k strategy. In this case, only the top_p strategy takes effect.	50
options.chat.generate_config.top_p	float	No	The probability threshold for nucleus sampling during generation. For example, a value of 0.8 means that only the smallest set of most likely tokens with a cumulative probability of 0.8 or higher is retained as the candidate set. The value must be in the range of (0, 1.0). A larger value increases the randomness of the generated text. A smaller value increases the determinism.	0.5
options.chat.generate_config.temperature	float	No	Controls the degree of randomness and diversity. Specifically, the temperature value controls the degree of smoothing applied to the probability distribution of each candidate word during text generation. A higher temperature value flattens the probability distribution, allowing more low-probability words to be selected and making the output more diverse. A lower temperature value sharpens the probability distribution, making high-probability words more likely to be selected and making the output more deterministic. The value must be in the range of [0, 2). Do not set this parameter to 0. Python SDK version >=1.10.1 Java SDK version >= 2.5.1	0.7
options.chat.history_max	integer	No	The maximum number of historical rounds for a multi-turn conversation. The maximum value is 20. The default value is 1.	20
options.chat.link	boolean	No	Specifies whether to return links. This controls whether the content generated by the model indicates the source of the reference. Valid values: true false (default) The following sample response is returned if the content includes the source: `An ECS disk can be resized online or offline[^1^]. Online resizing does not require an instance restart, but offline resizing does[^1^]. To resize a disk, select the disk on the ECS console, choose Resize from the Actions column, and then select a resizing method as needed[^1^]. To resize partitions and file systems, obtain the required information from the console[^2^]. After a disk is resized, its capacity cannot be reduced. Plan your storage space carefully[^3^].` The number enclosed by `[^` and `^]` indicates the index of the document in the reference of the result. For example, `[^1^]` indicates the first document in the reference.	false
options.chat.rich_text_strategy	string	No	The post-processing method for the output of a rich text LLM. If this parameter is not configured or is empty, the rich text feature is disabled and the default behavior is used. inside_response: The tags in the answer are restored directly to the original text in Markdown format. Note that tables are inserted directly into the Markdown file in HTML format. extend_response: If the answer contains rich text tags, the actual content of each tag is returned separately in rich_text_ref. Image content is returned as a URL, table content is in HTML format, and code is in text format.	inside_response
options.chat.agent	map	No	The options for configuring retrieval-augmented generation (RAG) tool capabilities. When enabled, the model decides whether to execute the corresponding tool based on the existing content. The following LLMs support this feature: qwen-plus qwen-max qwen2-72b-instruct
options.chat.agent.think_process	boolean	No	Specifies whether to return the thinking process.	true
options.chat.agent.max_think_round	integer	No	The number of thinking rounds. The maximum value is 20.	10
options.chat.agent.language	string	No	The language for the thinking process and the answer. AUTO: Automatically determines whether to use Chinese or English based on the user query. CN EN	AUTO
options.chat.agent.tools	list of string	No	The names of the RAG tools to use. The following tool is available: knowledge_search: knowledge base retrieval	["knowledge_search"]
options.retrieve	map	No	The parameters that control document retrieval.
options.retrieve.web_search.enable	boolean	No	Specifies whether to enable web search. true: The results are returned based on web search data. A single conversation consumes more time and compute resources. false	false
doc	map	No	The parameters that control retrieval.
options.retrieve.doc.disable	boolean	No	Specifies whether to disable knowledge base retrieval. false (default) true	false
options.retrieve.doc.filter	string	No	Filters the data retrieved from the knowledge base. By default, this parameter is empty. For more information about how to use the filter parameter, see filter parameter. Supported fields: table: The table where the document is located. raw_pk: The primary key of the document. category: The category of the document. score: The score of the document. timestamp: The timestamp of the document. Example format: "filter" : "raw_pk=\"123\"" # Retrieves data only from the document with id=123. "filter" : "category=\"value1\"" # Retrieves data only from documents in the category 'value1'. "filter" : "category=\"value1\" OR category=\"value2\"" # Retrieves data only from documents in the categories 'value1' and 'value2'. "filter" : "score>1.0" # Retrieves data only from documents with a score greater than 1.0. "filter" : "timestamp>1356969600" # Retrieves data only from documents with a timestamp after 2013-01-01.	category=\"value1\"
options.retrieve.doc.sf	float	No	The threshold for the vector score in vector retrieval. If sparse vectors are not enabled, the value must be in the range of [0, 2.0]. The default value is 1.3. A smaller value indicates that the results are more relevant, but fewer results are returned. A larger value may retrieve less relevant results. If sparse vectors are enabled, the default value is 0.35. A larger value indicates that the retrieved results are more relevant, but fewer results are returned. A smaller value may retrieve less relevant results.	0.35
options.retrieve.doc.top_n	integer	No	The number of documents to retrieve. The default value is 5. The value must be in the range of (0, 50].	5
options.retrieve.doc.formula	string	No	The formula for sorting documents during retrieval. Note For more information about the syntax, see Fine sort functions. The algorithm relevance and geographical location relevance features are not supported.	-timestamp: Sorts documents in descending order by the timestamp field.
options.retrieve.doc.rerank_size	integer	No	The number of documents to rerank when the rerank feature is enabled. The default value is 30. The value must be in the range of (0, 100].	30
options.retrieve.doc.operator	string	No	The relationship between the terms after the question.text is tokenized for knowledge base retrieval. This parameter takes effect only when sparse vectors are not enabled. AND (default): All terms must match for a document to be retrieved. OR: At least one term must match for a document to be retrieved.	AND
options.retrieve.doc.dense_weight	float	No	The weight of the dense vector during document retrieval when sparse vectors are enabled. The value must be in the range of (0.0, 1.0). The default value is 0.7.	0.7
options.retrieve.entry	map	No	The parameters that control the retrieval of results from manually intervened data.
options.retrieve.entry.disable	boolean	No	Specifies whether to disable the retrieval of manually intervened data. false (default) true	false
options.retrieve.entry.sf	float	No	The threshold for the vector score for retrieving manually intervened data. The value must be in the range of [0, 2.0]. The default value is 0.3. A smaller value indicates that the results are more relevant, but fewer results are returned. A larger value may retrieve less relevant results.	0.3
options.retrieve.image	map	No	The parameters that control the retrieval of image results from the knowledge base.
options.retrieve.image.disable	boolean	No	Specifies whether to disable the retrieval of image data. The default value is false. false (default) true	false
options.retrieve.image.sf	float	No	The threshold for the vector score in vector retrieval. If sparse vectors are not enabled, the value must be in the range of [0, 2.0]. The default value is 1.0. A smaller value indicates that the results are more relevant, but fewer results are returned. A larger value may retrieve less relevant results. If sparse vectors are enabled, the default value is 0.5. A larger value indicates that the retrieved results are more relevant, but fewer results are returned. A smaller value may retrieve less relevant results.	1.0
options.retrieve.image.dense_weight	float	No	The weight of the dense vector during image retrieval when sparse vectors are enabled. The value must be in the range of (0.0, 1.0). The default value is 0.7.	0.7
options.retrieve.qp	map	No	The options for query rewriting.
options.retrieve.qp.query_extend	boolean	No	Specifies whether to expand the user query. The expanded queries are used to retrieve document segments in the engine. The default value is false. false (default): Does not perform query expansion. This maintains consistency with the original logic. true: Performs an additional interaction with the LLM, which slows down the response speed. Do not enable this for time-sensitive applications.	false
options.retrieve.qp.query_extend_num	integer	No	The maximum number of queries to expand when similar query expansion is enabled. The default value is 5.	5
options.retrieve.rerank	map	No	The options for reranking during document retrieval.
options.retrieve.rerank.enable	boolean	No	Specifies whether to use a model to rerank the retrieved results based on relevance. Valid values: true false The default value is false if options.retrieve.doc.formula is not empty. Otherwise, the default value is true.	true
options.retrieve.rerank.model	string	No	The name of the LLM used for reranking. ops-bge-reranker-larger (default): The bge-reranker model. ops-text-reranker-001: A self-developed reranker model.	ops-bge-reranker-larger
options.retrieve.return_hits	boolean	No	Specifies whether to return the document retrieval results in the response, which is the search_hits parameter.	false

Sample request body

{
  "question": {
    "text": "What is Alibaba Cloud OpenSearch?",
    "session": "session_001",
    "type": "TEXT"
  },
  "options": {
    "chat": {
      "disable": false,
      "stream": false,
      "model": "Qwen",
      "history_max": 20,
      "link": false,
      "agent": {
        "tools": ["knowledge_search"]
      }
    },
    "retrieve": {
      "doc": {
        "disable": false,
        "filter": "category=\"type\"",
        "sf": 0.35,
        "top_n": 5,
        "operator": "OR"
      },
      "web_search": { "enable": false },
      "entry": { "disable": false, "sf": 0.3 },
      "image": { "disable": false, "sf": 1.0 },
      "rerank": {
        "enable": true,
        "model": "ops-bge-reranker-larger"
      },
      "return_hits": false
    }
  }
}

Key parameter descriptions:

question.session: Enables the multi-turn conversation context when set.
options.chat.disable: Set to true to skip the LLM and return retrieval results directly.
options.retrieve.doc.top_n: Controls the number of retrieved documents. The default value is 5.
options.retrieve.return_hits: Set to true to return the details of search_hits.

Response parameters

Parameter	Type	Description
request_id	string	The request ID.
status	string	The processing status of the request. OK FAIL
latency	float	The time the server took to process the request upon success. Unit: milliseconds.
id	integer	The primary key ID.
title	string	The document title.
category	string	The category name.
url	string	The document link.
answer	string	The Q&A result.
type	string	The type of the returned result.
scores	array	The document content score.
event	string	The thinking event. A round of the thinking process consists of THINK, ACTION, and ANSWER. The THINK event is not always returned. THINK indicates the thinking process. ACTION indicates the action performed. ANSWER indicates the conclusion of the current thinking round. SUMMARY is the final answer. Only one SUMMARY event of the text type is returned.
event_status	string	Indicates whether the result is complete. PROCESSING FINISHED
code	string	The error code. This parameter is returned if an error occurs.
message	string	The error message. This parameter is returned if an error occurs.

Sample response body

Successful response

{
  "request_id": "6859E98D-D885-4AEF-B61C-9683A0184744",
  "status": "OK",
  "latency": 6684.41,
  "result": {
    "data": [
      {
        "answer": "Alibaba Cloud OpenSearch is a structured data search managed service...",
        "type": "TEXT",
        "reference": [
          {"url": "https://www.alibabacloud.com/help/document_detail/463469.html", "title": "OpenSearch Product Introduction"}
        ]
      }
    ],
    "search_hits": [
      {
        "fields": {"content": "OpenSearch-related documentation content...", "title": "OpenSearch Introduction"},
        "scores": ["0.9778"],
        "type": "DOC"
      }
    ]
  }
}

Error response

{
  "request_id": "e579a090bf99dc787d29d878b40c8367",
  "status": "FAIL",
  "errors": [
    {"code": 3005, "message": "topN[51] is not in (0, 50]"}
  ]
}

Common error codes

2001: The application does not exist.
3005: parameter verification failed (for example, top_n is out of range).
4016: The request header is missing authentication information.