All Products
Search
Document Center

OpenSearch:Perform a text-based conversational search

Last Updated:Jun 17, 2025

This topic describes the API used to perform a text-based conversational search based on a knowledge base. Multi-round conversations are supported. You can specify a session ID to retain the context of a multi-round conversation. In addition, you can select the desired LLM to generate answers and customize answers to be generated.

Prerequisites

  • An API key for identity authentication is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must be authenticated. For more information, see Manage API keys. LLM is short for large language model.

  • An endpoint is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must specify an endpoint. For more information, see Obtain endpoints.

Operation information

Request method

Request protocol

Request data format

POST

HTTP

JSON

Request URL

{host}/v3/openapi/apps/{app_group_identity}/actions/knowledge-search
  • {host}: the endpoint that is used to call the API operation. You can call the API operation over the Internet or a virtual private cloud (VPC). For more information about how to obtain an endpoint, see Obtain endpoints.

  • {app_group_identity}: the name of the application that you want to access. You can log on to the OpenSearch LLM-Based Conversational Search Edition console and view the application name of the corresponding instance on the Instance Management page.

Request parameters

Header parameters

Parameter

Type

Required

Description

Example

Content-Type

string

Yes

The data format of the request. Only the JSON format is supported. Set the value to application/json.

application/json

Authorization

string

Yes

The API key used for request authentication. The value must start with Bearer.

Bearer OS-d1**2a

Body parameters

Parameter

Type

Required

Description

Example

question

map

Yes

The input question.

{

"text":"user

question",

"type": "TEXT",

"session" : ""

}

question.text

string

Yes

The text content of the input question.

user question

question.session

string

No

The session ID of the multi-round conversation. The ID is used to identify the context of the multi-round conversation.

  • If you do not specify this parameter or leave this parameter empty, the multi-round conversation feature is disabled.

  • If you set this parameter to a non-null value, the multi-round conversation feature is enabled. In this case, the system retains the conversations with the same session ID as the context of the multi-round conversation.

1725530408586

question.type

string

No

The format of the input question. Example: TEXT.

TEXT

options

map

No

The additional configurations, such as retrieval, model, and prompt configurations.

options.chat

map

No

The configuration of LLM access.

options.chat.disable

boolean

No

Specifies whether to disable LLM access. Valid values:

  • false (default): enables LLM access and summarizes and generates results by using an LLM.

  • true: disables LLM access.

false

options.chat.stream

boolean

No

Specifies whether to enable HTTP chunked transfer encoding. Valid values:

  • true (default)

  • false

true

options.chat.model

string

No

The LLM to be used. Valid values:

Singapore

  • opensearch-llama2-13b

  • opensearch-falcon-7b

  • qwen-turbo

  • qwen-plus

  • qwen-max

  • qwen2-72b-instruct

opensearch-llama2-13b

options.chat.enable_deep_search

boolean

No

Specifies whether to enable the deep search feature. Valid values:

  • true: enables the deep search feature. In this case, multiple rounds of inference are required to synthesize data and return results. A single conversation consumes a relatively large amount of time and computing resources.

  • false: disables the deep search feature.

false

options.chat.model_generation

integer

No

The version of the custom model to be used. By default, the earliest version is used.

20

options.chat.prompt_template

string

No

The name of the custom prompt template. By default, this parameter is left empty. In this case, the built-in prompt template is used.

user_defined_prompt_name

options.chat.prompt_config

object

No

The configuration of the custom prompt template. Specify key-value pairs in the following format:

{
  "String_key": "value",
  "Integer_key" : 1
}
{
  "attitude": "normal",
  "rule" : "detailed",
  "noanswer": "sorry",
  "language": "Chinese",
  "role": false,
  "role_name": "AI assistant",
}

options.chat.prompt_config.attitude

string

No

The tone of the conversation. This parameter is included in the built-in prompt template. Valid values:

  • normal (default)

  • polite

  • patience

normal

options.chat.prompt_config.rule

string

No

The detail level of the conversation. Valid values:

  • detailed (default)

  • stepbystep

detailed

options.chat.prompt_config.noanswer

string

No

The information returned if the system fails to find an answer to the question. Valid values:

  • sorry (default)

  • uncertain

sorry

options.chat.prompt_config.language

string

No

The language of the answer. Valid values:

  • Chinese (default)

  • English

  • Thai

  • Korean

Chinese

options.chat.prompt_config.role

boolean

No

Specifies whether to enable a custom role to answer the question. If yes, you need to specify a custom role.

false

options.chat.prompt_config.role_name

string

No

The name of the custom role. Example: AI Assistant.

AI Assistant

options.chat.prompt_config.out_format

string

No

The format of the answer. Valid values:

  • text (default)

  • table

  • list

  • markdown

text

options.chat.generate_config.repetition_penalty

float

No

The repetition level of the content generated by the model. A larger value indicates lower repetition. A value of 1.0 indicates no penalty. No valid values are specified for this parameter.

1.01

options.chat.generate_config.top_k

integer

No

The size of the candidate set from which tokens are sampled. For example, if this parameter is set to 50, the top 50 tokens with the highest probability are used as the candidate set. The larger the value, the higher the randomness of the generated content. Conversely, the smaller the value, the more deterministic the generated content. Default value: 0, which indicates that the top_k parameter is disabled. In this case, only the top_p parameter takes effect.

50

options.chat.generate_config.top_p

float

No

The probability threshold in the nucleus sampling method used during the generation process. For example, if this parameter is set to 0.8, only the smallest subset of the most probable tokens that sum to a cumulative probability of at least 0.8 is kept as the candidate set. Valid values: (0, 1.0). The larger the value, the higher the randomness of the generated content. Conversely, the smaller the value, the more deterministic the generated content.

0.5

options.chat.generate_config.temperature

float

No

The level of randomness and diversity of the generated content. To be specific, the temperature value determines how much the probability distribution for each candidate word is smoothed during text generation. Larger temperature values decrease the peaks of the probability distribution, allowing for the selection of more low-probability words and resulting in more diverse content. Conversely, smaller temperature values increase the peaks of the probability distribution, making high-probability words more likely to be chosen and resulting in more deterministic content.

Valid values: [0, 2). We recommend that you do not set this parameter to 0 because it is meaningless.

python version >=1.10.1

java version >= 2.5.1

0.7

options.chat.history_max

integer

No

The maximum number of rounds of conversations based on which the system returns results. Maximum value: 20. Default value: 1.

20

options.chat.link

boolean

No

Specifies whether to return the URL of the reference source. To be specific, this parameter specifies whether the reference source is included in the content generated by the model. Valid values:

  • true

  • false (default)

Sample response if you set this parameter to true:

You can resize the disk of an Elastic Compute Service (ECS) instance online or offline[^1^]. If you use the online resizing method, you can resize the disk without the need to restart the instance. If you use the offline resizing method, you must restart the instance[^1^]. To resize a disk, perform the following steps: Log on to the ECS console, find the disk that you want to resize, click Resize in the Actions column, and then select a resizing method based on your business requirements[^1^]. If you need to resize partitions and file systems, you can obtain relevant information by using the CLI or in the console[^2^]. After an ECS disk is resized, you cannot reduce the capacity. We recommend that you implement reasonable capacity planning[^3^].

[^Number^] indicates the ordinal number of the retrieved document in the reference of the returned results. For example, [^1^] indicates the first document in the reference.

false

options.chat.rich_text_strategy

string

No

The processing method of rich text. If this parameter does not exist or is left empty, rich text is not enabled, and the default processing method is used.

  • inside_response: The rich text tag in the answer is directly restored to the original text in the Markdown format. Note that a table is directly inserted into the Markdown file in the HTML format.

  • extend_response: The actual content of each rich text tag in the answer is returned by rich_text_ref. A picture is returned as a URL, a table is returned in the HTML format, and code is returned in the text format.

inside_response

options.chat.agent

map

No

Specifies whether to enable the Retrieval-Augmented Generation (RAG) tool feature. If the feature is enabled, the model determines whether to use a RAG tool based on the existing content. The feature is supported by the following LLMs:

  • qwen-plus

  • qwen-max

  • qwen2-72b-instruct

options.chat.agent.tools

list of string

No

The name of the RAG tool to be used. The following tool is available:

  • knowledge_search: knowledge base retrieval

["knowledge_search"]

options.retrieve

map

No

The additional configurations, such as retrieval, model, and prompt configurations.

options.retrieve.web_search.enable

boolean

No

Specifies whether to enable the Internet search feature. Valid values:

  • true: enables the Internet search feature. In this case, the results are returned based on data across the Internet. A single conversation consumes a relatively large amount of time and computing resources.

  • false: disables the Internet search feature.

false

doc

map

No

The retrieval configuration.

options.retrieve.doc.disable

boolean

No

Specifies whether to disable document retrieval. Valid values:

  • false (default)

  • true

false

options.retrieve.doc.filter

string

No

The filter that is used to filter documents in the knowledge base based on a specific field during document retrieval. By default, this parameter is left empty. For more information, see filter.

The following fields are supported:

  • table: a table.

  • raw_pk: the primary key of a document.

  • category: the category of a document.

  • score: the score of a document.

  • timestamp: the timestamp of a document.

Example:

"filter" : "raw_pk=\"123\""   # Obtains data from the documents whose primary key is 123.
"filter" : "category=\"value1\""   # Obtains data from the documents whose category is value1.
"filter" : "category=\"value1\" OR category=\"value2\"" # Obtains data from the documents whose category is value1 or value2.
"filter" : "score>1.0"   # Obtains data from the documents whose score is greater than 1.0.
"filter" : "timestamp>1356969600"   # Obtains data from the documents whose timestamp is greater than 1356969600.

category=\"value1\"

options.retrieve.doc.sf

float

No

The threshold for determining the vector relevance for document retrieval.

  • If the sparse vector model is disabled, the parameter value ranges from 0 to 2.0 and the default value is 1.3. The smaller the value, the higher the document relevance but the fewer the retrieved documents. Conversely, less relevant documents may be retrieved.

  • If the sparse vector model is enabled, the default value is 0.35. The larger the value, the higher the document relevance but the fewer the retrieved documents. Conversely, less relevant documents may be retrieved.

1.3

options.retrieve.doc.top_n

integer

No

The number of documents to be retrieved. Valid values: (0, 50]. Default value: 5.

5

options.retrieve.doc.formula

string

No

The formula based on which the retrieved documents are sorted.

Note

For information about the syntax, see Fine sort functions. Algorithm relevance and geographical location relevance are not supported.

-timestamp: sorts the retrieved documents in descending order by document timestamp.

options.retrieve.doc.rerank_size

integer

No

The number of documents to be reranked if the reranking feature is enabled. Valid values: (0, 100]. Default value: 30.

30

options.retrieve.doc.operator

string

No

The operator between terms obtained after text segmentation during document retrieval. This parameter takes effect only if the sparse vector model is disabled.

  • AND (default): The documents that match all the terms are retrieved. Default value: AND.

  • OR: The documents that match at least one of the terms are retrieved.

AND

options.retrieve.doc.dense_weight

float

No

The weight of the dense vector during document retrieval if the sparse vector model is enabled. Valid values: (0.0, 1.0). Default value: 0.7.

0.7

options.retrieve.entry

map

No

The configuration of intervention data retrieval.

options.retrieve.entry.disable

boolean

No

Specifies whether to disable intervention data retrieval. Valid values:

  • false (default)

  • true

false

options.retrieve.entry.sf

float

No

The threshold for determining the vector relevance for intervention data retrieval. Valid values: [0, 2.0]. Default value: 0.3. The smaller the value, the higher the document relevance but the fewer the retrieved documents. Conversely, less relevant documents may be retrieved.

0.3

options.retrieve.image

map

No

The configuration of image retrieval.

options.retrieve.image.disable

boolean

No

Specifies whether to disable image retrieval. Valid values:

  • false (default)

  • true

false

options.retrieve.image.sf

float

No

The threshold for determining the vector relevance for document retrieval.

  • If the sparse vector model is disabled, the parameter value ranges from 0 to 2.0 and the default value is 1.0. The smaller the value, the higher the document relevance but the fewer the retrieved documents. Conversely, less relevant documents may be retrieved.

  • If the sparse vector model is enabled, the default value is 0.5. The larger the value, the higher the document relevance but the fewer the retrieved documents. Conversely, less relevant documents may be retrieved.

1.0

options.retrieve.image.dense_weight

float

No

The weight of the dense vector during image retrieval if the sparse vector model is enabled. Valid values: (0.0, 1.0). Default value: 0.7.

0.7

options.retrieve.qp

map

No

The configuration of query rewriting.

options.retrieve.qp.query_extend

boolean

No

Specifies whether to extend queries. The extended queries are used to retrieve document segments in OpenSearch. Valid values:

  • false (default): does not extend queries.

  • true: extends queries. An additional interaction with the LLM is performed. This slows down the system response. Do not extend queries for applications that require fast response.

false

options.retrieve.qp.query_extend_num

integer

No

The maximum number of queries to be extended if the query extension feature is enabled. Default value: 5.

5

options.retrieve.rerank

map

No

The reranking configuration for document retrieval.

options.retrieve.rerank.enable

boolean

No

Specifies whether to use the model to rerank the retrieved results based on the relevance. Valid values:

  • true

  • false

  • Default value if the options.retrieve.doc.formula parameter is specified: false. Default value if the options.retrieve.doc.formula parameter is left empty: true.

true

options.retrieve.rerank.model

string

No

The name of the LLM for reranking. Valid values:

  • ops-bge-reranker-larger (default): bge-reranker model.

  • ops-text-reranker-001: self-developed reranker model.

ops-bge-reranker-larger

options.retrieve.return_hits

boolean

No

Specifies whether to return document retrieval results. If you set this parameter to true, the search_hits parameter is returned in the response.

false

Sample request body

{
    "question" : {
        "text" : "user question",
        "session" : "The session of the conversation. You can specify this parameter to enable the multi-round conversation feature.",
        "type" : "TEXT"
    },
    "options": {
        "chat": {
            "disable" : false, # Specifies whether to disable LLM access and directly return document retrieval results. Default value: false, which indicates that LLM access is enabled. 
            "stream" : false, # Specifies whether to enable HTTP chunked transfer encoding. Default value: false. 
            "model" : "Qwen", # The LLM to be used.
            "prompt_template" : "user_defined_prompt_name", # The name of the custom prompt template.
            "prompt_config" : { # Optional. The configuration of the custom prompt template.
                "key" : "value" # Specify a key-value pair.
            },
            "generate_config" : {
                "repetition_penalty": 1.01,
                "top_k": 50,
                "top_p": 0.5,
                "temperature": 0.7
            },
            "history_max": 20, # The maximum number of rounds of conversations based on which the system returns results.
            "link": false, # Specifies whether to return the URL of the reference source.
            "agent":{
                "tools":["knowledge_search"]
            }
        },
        "retrieve": {
            "doc": {
                "disable": false, # Specifies whether to disable document retrieval. Default value: false. 
                "filter": "category=\"type\"", # The filter that is used to filter documents based on the category field during document retrieval. By default, this parameter is left empty.
                "sf": 1.3,    # The threshold for determining the vector relevance for document retrieval. Default value: 1.3. The larger the value, the less relevant the retrieved documents.
                "top_n": 5,    # The number of documents to be retrieved. Valid values: (0, 50]. Default value: 5.
                "formula" : "", # The formula for document retrieval. By default, documents are retrieved based on vector similarity.
                "rerank_size" : 5, # The number of documents to be fine sorted. By default, you do not need to specify this parameter. The system automatically determines the number of documents to be fine sorted.
                "operator": "OR" # The operator between text tokens. Default value: AND.
            },
            "web_search":{
                      "enable": false # Specifies whether to enable the Internet search feature. Default value: false.
            },
            "entry": {
                "disable": false, # Specifies whether to disable intervention data retrieval. Default value: false. 
                "sf": 0.3 # The threshold for determining the vector relevance for intervention data retrieval. Default value: 0.3.
            },
            "image": {
                "disable": false,  # Specifies whether to disable image retrieval. Default value: false. 
                "sf": 1.0          # The threshold for determining the vector relevance for image retrieval. Default value: 1.0.
            },
            "qp": {
                "query_extend": false, # Specifies whether to extend queries.
                "query_extend_num": 5 # The maximum number of queries to be extended. Default value: 5.
            },
            "rerank" : {
                "enable": true # Specifies whether to use the LLM to rerank the retrieved results. Default value: true.
                "model":"model_name" # The name of the LLM.
            },
            "return_hits": false   # Specifies whether to return document retrieval results. If you set this parameter to true, the search_hits parameter is returned in the response.
        }
    }
}

Response parameters

Parameter

Type

Description

request_id

string

The request ID.

status

string

Indicates whether the request was successful. Valid values:

  • OK

  • FAIL

latency

float

The amount of time consumed by the server to process a successful request. Unit: milliseconds.

id

integer

The ID of the primary key.

title

string

The title of the document.

category

string

The name of the category.

url

string

The URL of the document.

answer

string

The returned result.

type

string

The format of the returned result.

scores

array

The relevance-based score of the document.

code

string

The error code returned.

message

string

The error message returned.

Sample response body

{
  "request_id": "6859E98D-D885-4AEF-B61C-9683A0184744",
  "status": "OK",
  "latency": 6684.410397,
  "result" : {
    "data" : [
      {
        "answer" : "answer text",
        "type" : "TEXT",
        "reference" : [
          {"url" : "http://....","title":"doc title"}
    		]
      },
      {
        "reference": [
          {"id": "16","title": "Test title","category": "Test category","url": "Test URL"}
        ],
        "answer": "https://ecmb.bdimg.com/tam-ogel/-xxxx.jpg",
        "type": "IMAGE"
      }
    ],
    "search_hits" : [  // This parameter is returned only if the options.retrieve.return_hits parameter in the request is set to true.
      {
        "fields" : {
          "content" : "...."
          "key1" : "value1"
        },
        "scores" : ["10000.1234"],
        "type" : "doc"
      },
      {
        "fields"{
          "answer" : "...",
          "key1" : "value1"
        },
        "scores" : ["10000.1234"],
        "type" : "entry"
      }
    ]
  }
  "errors" : [
    {
      "code" : "The error code that is returned if an error occurs.",
      "message" : "The error message that is returned if an error occurs."
    }
  ]
}