All Products
Search
Document Center

OpenSearch:Unified Q&A API

Last Updated:Mar 11, 2026

The MultiSearch unified Q&A API retrieves content from a specified knowledge base. This API supports both text-based and table-based query methods, allowing you to perform intelligent Q&A on structured tables and unstructured documents.

Prerequisites

  • Obtain an API key. An API key is required for identity authentication when you call the OpenSearch LLM-Based Conversational Search service. For more information, see Manage API keys.

  • Obtain a service endpoint. You must provide a service endpoint to call operations of the OpenSearch LLM-Based Conversational Search service. For more information, see Obtain service endpoints.

Precautions

Note the following when you use the unified Q&A API:

  • If a custom table exists, a table-based Q&A query is performed first. Otherwise, a text-based Q&A query is performed.

  • If only one custom table exists, the Q&A query is performed on that table. If multiple custom tables exist, the system automatically selects the table most relevant to the user's question and performs the Q&A query on that table.

  • If the table-based Q&A query returns no results, a text-based Q&A query is performed.

  • If you use this API with multiple custom tables, ensure that the tables are highly distinct. Otherwise, incorrect answers may be returned.

API information

Request method

Protocol

Request data format

POST

HTTP

JSON

Request URL

{host}/v3/openapi/apps/[app_group_identity]/actions/multi-search

Request parameters

Header parameters

Parameter

Type

Required

Description

Example

Content-Type

string

Yes

The data format of the request. Set the value to "application/json".

application/json

Authorization

string

Yes

The API key for request authentication. The value must start with Bearer.

Bearer OS-d1**2a

accept

String

No

For Server-Sent Events (SSE) requests, set the value to "text/event-stream".

text/event-stream

Body parameters

Parameter

Type

Required

Description

Example

question

map

Yes

The input question.

{

"text":"user

question",

"type": "TEXT",

"session" : ""

}

question.text

string

Yes

The text of the input question.

user question

question.session

string

No

The session ID for a multi-turn conversation. This ID identifies the context of the conversation. Valid values:

  • If you do not set this parameter or leave it empty, multi-turn conversation is disabled.

  • If you set this parameter to a non-empty value, multi-turn conversation is enabled. The system retains conversations with the same session ID as the context for the multi-turn conversation.

1725530408586

question.type

string

No

The type of the input question. Set the value to TEXT.

TEXT

options

map

No

The additional request parameters that control retrieval, models, prompts, and more.

options.chat

map

No

The parameters related to large language model (LLM) access.

options.chat.disable

boolean

No

Specifies whether to disable LLM access.

  • false (default): Accesses the LLM to summarize and generate results.

  • true: Does not access the LLM.

false

options.chat.stream

boolean

No

Specifies whether to return results in a stream.

  • true (default): Returns results in a stream.

  • false: Returns results in a non-streaming manner.

true

options.chat.model

string

No

The LLM to use. Valid values:

Singapore region

  • opensearch-llama2-13b

  • opensearch-falcon-7b

  • qwen-turbo

  • qwen-plus

  • qwen-max

  • qwen2-72b-instruct

opensearch-llama2-13b

options.chat.enable_deep_search

boolean

No

Specifies whether to enable deep search.

  • true: This requires multiple rounds of inference to synthesize data and return results. A single conversation consumes more time and compute resources.

  • false

false

options.chat.model_generation

integer

No

When using a product-customized model, set the corresponding model version. By default, the oldest version is used.

20

options.chat.prompt_template

string

No

The name of the custom prompt template. If this parameter is empty, the system's built-in prompt template is used.

user_defined_prompt_name

options.chat.prompt_config

object

No

The key-value pairs configured in the custom prompt. The parameter format is:

{
  "String_key": "value",
  "Integer_key" : 1
}
{
  "attitude": "normal",
  "rule" : "detailed",
  "noanswer": "sorry",
  "language": "Chinese",
  "role": false,
  "role_name": "AI Assistant",
}

options.chat.prompt_config.attitude

string

No

A parameter in the built-in template that controls the tone of the conversation. The default value is normal.

  • normal: Neutral.

  • polite: Uses a kind and polite tone.

  • patience: Uses a gentle and patient tone.

normal

options.chat.prompt_config.rule

string

No

The level of detail in the conversation. The default value is detailed.

  • detailed: Detailed and professional.

  • stepbystep: Detailed and step-by-step.

detailed

options.chat.prompt_config.noanswer

string

No

The response when an answer cannot be found. The default value is sorry.

  • sorry: Sorry, I cannot answer this question based on the available information.

  • uncertain: I don't know.

sorry

options.chat.prompt_config.language

string

No

The language used for the answer. The default value is Chinese.

  • Chinese

  • English

  • Thai

  • Korean

Chinese

options.chat.prompt_config.role

boolean

No

Specifies whether to enable a custom role for the answer.

false

options.chat.prompt_config.role_name

string

No

The name of the custom role. For example: AI Assistant.

AI Assistant

options.chat.prompt_config.out_format

string

No

The format of the output content. The default value is text.

  • text

  • table

  • list

  • markdown

text

options.chat.generate_config.repetition_penalty

float

No

Controls the repetition of consecutive sequences in the model's output. Increasing the repetition_penalty reduces the repetition in the generated text. A value of 1.0 means no penalty. There is no strict value range.

1.01

options.chat.generate_config.top_k

integer

No

The size of the candidate set for sampling during generation. For example, a value of 50 means that only the top 50 tokens with the highest scores in a single generation are used to form the random sampling candidate set. A larger value increases the randomness of the generated text. A smaller value increases the determinism. The default value is 0, which disables the top_k strategy. In this case, only the top_p strategy takes effect.

50

options.chat.generate_config.top_p

float

No

The probability threshold for nucleus sampling during generation. For example, a value of 0.8 means that only the smallest set of most likely tokens with a cumulative probability of 0.8 or higher is retained as the candidate set. The value must be in the range of (0, 1.0). A larger value increases the randomness of the generated text. A smaller value increases the determinism.

0.5

options.chat.generate_config.temperature

float

No

Controls the degree of randomness and diversity. Specifically, the temperature value controls the degree of smoothing applied to the probability distribution of each candidate word during text generation. A higher temperature value flattens the probability distribution, allowing more low-probability words to be selected and making the output more diverse. A lower temperature value sharpens the probability distribution, making high-probability words more likely to be selected and making the output more deterministic.

The value must be in the range of [0, 2). Do not set this parameter to 0.

Python SDK version >=1.10.1

Java SDK version >= 2.5.1

0.7

options.chat.history_max

integer

No

The maximum number of historical rounds for a multi-turn conversation. The maximum value is 20. The default value is 1.

20

options.chat.link

boolean

No

Specifies whether to return links. This controls whether the content generated by the model indicates the source of the reference. Valid values:

  • true

  • false (default)

The following sample response is returned if the content includes the source:

An ECS disk can be resized online or offline[^1^]. Online resizing does not require an instance restart, but offline resizing does[^1^]. To resize a disk, select the disk on the ECS console, choose Resize from the Actions column, and then select a resizing method as needed[^1^]. To resize partitions and file systems, obtain the required information from the console[^2^]. After a disk is resized, its capacity cannot be reduced. Plan your storage space carefully[^3^].

The number enclosed by [^ and ^] indicates the index of the document in the reference of the result. For example, [^1^] indicates the first document in the reference.

false

options.chat.rich_text_strategy

string

No

The post-processing method for the output of a rich text LLM. If this parameter is not configured or is empty, the rich text feature is disabled and the default behavior is used.

  • inside_response: The tags in the answer are restored directly to the original text in Markdown format. Note that tables are inserted directly into the Markdown file in HTML format.

  • extend_response: If the answer contains rich text tags, the actual content of each tag is returned separately in rich_text_ref. Image content is returned as a URL, table content is in HTML format, and code is in text format.

inside_response

options.chat.agent

map

No

The options for configuring retrieval-augmented generation (RAG) tool capabilities. When enabled, the model decides whether to execute the corresponding tool based on the existing content. The following LLMs support this feature:

  • qwen-plus

  • qwen-max

  • qwen2-72b-instruct

options.chat.agent.think_process

boolean

No

Specifies whether to return the thinking process.

true

options.chat.agent.max_think_round

integer

No

The number of thinking rounds. The maximum value is 20.

10

options.chat.agent.language

string

No

The language for the thinking process and the answer.

AUTO: Automatically determines whether to use Chinese or English based on the user query.

CN

EN

AUTO

options.chat.agent.tools

list of string

No

The names of the RAG tools to use. The following tool is available:

  • knowledge_search: knowledge base retrieval

["knowledge_search"]

options.retrieve

map

No

The parameters that control document retrieval.

options.retrieve.web_search.enable

boolean

No

Specifies whether to enable web search.

  • true: The results are returned based on web search data. A single conversation consumes more time and compute resources.

  • false

false

doc

map

No

The parameters that control retrieval.

options.retrieve.doc.disable

boolean

No

Specifies whether to disable knowledge base retrieval.

  • false (default)

  • true

false

options.retrieve.doc.filter

string

No

Filters the data retrieved from the knowledge base. By default, this parameter is empty. For more information about how to use the filter parameter, see filter parameter.

Supported fields:

  • table: The table where the document is located.

  • raw_pk: The primary key of the document.

  • category: The category of the document.

  • score: The score of the document.

  • timestamp: The timestamp of the document.

Example format:

"filter" : "raw_pk=\"123\""   # Retrieves data only from the document with id=123.
"filter" : "category=\"value1\""   # Retrieves data only from documents in the category 'value1'.
"filter" : "category=\"value1\" OR category=\"value2\"" # Retrieves data only from documents in the categories 'value1' and 'value2'.
"filter" : "score>1.0"   # Retrieves data only from documents with a score greater than 1.0.
"filter" : "timestamp>1356969600"   # Retrieves data only from documents with a timestamp after 2013-01-01.

category=\"value1\"

options.retrieve.doc.sf

float

No

The threshold for the vector score in vector retrieval.

  • If sparse vectors are not enabled, the value must be in the range of [0, 2.0]. The default value is 1.3. A smaller value indicates that the results are more relevant, but fewer results are returned. A larger value may retrieve less relevant results.

  • If sparse vectors are enabled, the default value is 0.35. A larger value indicates that the retrieved results are more relevant, but fewer results are returned. A smaller value may retrieve less relevant results.

0.35

options.retrieve.doc.top_n

integer

No

The number of documents to retrieve. The default value is 5. The value must be in the range of (0, 50].

5

options.retrieve.doc.formula

string

No

The formula for sorting documents during retrieval.

Note

For more information about the syntax, see Fine sort functions. The algorithm relevance and geographical location relevance features are not supported.

-timestamp: Sorts documents in descending order by the timestamp field.

options.retrieve.doc.rerank_size

integer

No

The number of documents to rerank when the rerank feature is enabled. The default value is 30. The value must be in the range of (0, 100].

30

options.retrieve.doc.operator

string

No

The relationship between the terms after the question.text is tokenized for knowledge base retrieval. This parameter takes effect only when sparse vectors are not enabled.

  • AND (default): All terms must match for a document to be retrieved.

  • OR: At least one term must match for a document to be retrieved.

AND

options.retrieve.doc.dense_weight

float

No

The weight of the dense vector during document retrieval when sparse vectors are enabled. The value must be in the range of (0.0, 1.0). The default value is 0.7.

0.7

options.retrieve.entry

map

No

The parameters that control the retrieval of results from manually intervened data.

options.retrieve.entry.disable

boolean

No

Specifies whether to disable the retrieval of manually intervened data.

  • false (default)

  • true

false

options.retrieve.entry.sf

float

No

The threshold for the vector score for retrieving manually intervened data. The value must be in the range of [0, 2.0]. The default value is 0.3. A smaller value indicates that the results are more relevant, but fewer results are returned. A larger value may retrieve less relevant results.

0.3

options.retrieve.image

map

No

The parameters that control the retrieval of image results from the knowledge base.

options.retrieve.image.disable

boolean

No

Specifies whether to disable the retrieval of image data. The default value is false.

  • false (default)

  • true

false

options.retrieve.image.sf

float

No

The threshold for the vector score in vector retrieval.

  • If sparse vectors are not enabled, the value must be in the range of [0, 2.0]. The default value is 1.0. A smaller value indicates that the results are more relevant, but fewer results are returned. A larger value may retrieve less relevant results.

  • If sparse vectors are enabled, the default value is 0.5. A larger value indicates that the retrieved results are more relevant, but fewer results are returned. A smaller value may retrieve less relevant results.

1.0

options.retrieve.image.dense_weight

float

No

The weight of the dense vector during image retrieval when sparse vectors are enabled. The value must be in the range of (0.0, 1.0). The default value is 0.7.

0.7

options.retrieve.qp

map

No

The options for query rewriting.

options.retrieve.qp.query_extend

boolean

No

Specifies whether to expand the user query. The expanded queries are used to retrieve document segments in the engine. The default value is false.

  • false (default): Does not perform query expansion. This maintains consistency with the original logic.

  • true: Performs an additional interaction with the LLM, which slows down the response speed. Do not enable this for time-sensitive applications.

false

options.retrieve.qp.query_extend_num

integer

No

The maximum number of queries to expand when similar query expansion is enabled. The default value is 5.

5

options.retrieve.rerank

map

No

The options for reranking during document retrieval.

options.retrieve.rerank.enable

boolean

No

Specifies whether to use a model to rerank the retrieved results based on relevance. Valid values:

  • true

  • false

  • The default value is false if options.retrieve.doc.formula is not empty. Otherwise, the default value is true.

true

options.retrieve.rerank.model

string

No

The name of the LLM used for reranking.

  • ops-bge-reranker-larger (default): The bge-reranker model.

  • ops-text-reranker-001: A self-developed reranker model.

ops-bge-reranker-larger

options.retrieve.return_hits

boolean

No

Specifies whether to return the document retrieval results in the response, which is the search_hits parameter.

false

Sample request body

{
  "question": {
    "text": "What is Alibaba Cloud OpenSearch?",
    "session": "session_001",
    "type": "TEXT"
  },
  "options": {
    "chat": {
      "disable": false,
      "stream": false,
      "model": "Qwen",
      "history_max": 20,
      "link": false,
      "agent": {
        "tools": ["knowledge_search"]
      }
    },
    "retrieve": {
      "doc": {
        "disable": false,
        "filter": "category=\"type\"",
        "sf": 0.35,
        "top_n": 5,
        "operator": "OR"
      },
      "web_search": { "enable": false },
      "entry": { "disable": false, "sf": 0.3 },
      "image": { "disable": false, "sf": 1.0 },
      "rerank": {
        "enable": true,
        "model": "ops-bge-reranker-larger"
      },
      "return_hits": false
    }
  }
}

Key parameter descriptions:

  • question.session: Enables the multi-turn conversation context when set.

  • options.chat.disable: Set to true to skip the LLM and return retrieval results directly.

  • options.retrieve.doc.top_n: Controls the number of retrieved documents. The default value is 5.

  • options.retrieve.return_hits: Set to true to return the details of search_hits.

Response parameters

Parameter

Type

Description

request_id

string

The request ID.

status

string

The processing status of the request.

  • OK

  • FAIL

latency

float

The time the server took to process the request upon success. Unit: milliseconds.

id

integer

The primary key ID.

title

string

The document title.

category

string

The category name.

url

string

The document link.

answer

string

The Q&A result.

type

string

The type of the returned result.

scores

array

The document content score.

event

string

The thinking event.

A round of the thinking process consists of THINK, ACTION, and ANSWER. The THINK event is not always returned. THINK indicates the thinking process. ACTION indicates the action performed. ANSWER indicates the conclusion of the current thinking round. SUMMARY is the final answer. Only one SUMMARY event of the text type is returned.

event_status

string

Indicates whether the result is complete.

PROCESSING

FINISHED

code

string

The error code. This parameter is returned if an error occurs.

message

string

The error message. This parameter is returned if an error occurs.

Sample response body

Successful response

{
  "request_id": "6859E98D-D885-4AEF-B61C-9683A0184744",
  "status": "OK",
  "latency": 6684.41,
  "result": {
    "data": [
      {
        "answer": "Alibaba Cloud OpenSearch is a structured data search managed service...",
        "type": "TEXT",
        "reference": [
          {"url": "https://www.alibabacloud.com/help/document_detail/463469.html", "title": "OpenSearch Product Introduction"}
        ]
      }
    ],
    "search_hits": [
      {
        "fields": {"content": "OpenSearch-related documentation content...", "title": "OpenSearch Introduction"},
        "scores": ["0.9778"],
        "type": "DOC"
      }
    ]
  }
}

Error response

{
  "request_id": "e579a090bf99dc787d29d878b40c8367",
  "status": "FAIL",
  "errors": [
    {"code": 3005, "message": "topN[51] is not in (0, 50]"}
  ]
}

Common error codes

  • 2001: The application does not exist.

  • 3005: parameter verification failed (for example, top_n is out of range).

  • 4016: The request header is missing authentication information.