All Products
Search
Document Center

OpenSearch:Perform conversations with an LLM

Last Updated:Jun 17, 2025

This topic describes the API used to perform conversations with a large language model (LLM). You can configure prompt parameters to specify questions or conversation content. You can also select an LLM based on your business requirements and configure settings such as whether to moderate the results that are generated by the LLM.

Prerequisites

  • An API key for identity authentication is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must be authenticated. For more information, see Manage API keys. LLM is short for large language model.

  • An endpoint is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must specify an endpoint. For more information, see Obtain endpoints.

Operation information

Request method

Request protocol

Request data format

POST

HTTP

JSON

Request URL

{host}/v3/openapi/apps/[app_group_identity]/actions/knowledge-llm
  • {host}: the endpoint that is used to call the API operation. You can call the API operation over the Internet or a virtual private cloud (VPC). For more information about how to obtain an endpoint, see Obtain endpoints.

  • {app_group_identity}: the name of the application that you want to access. You can log on to the OpenSearch LLM-Based Conversational Search Edition console and view the application name of the corresponding instance on the Instance Management page.

Request parameters

Parameter

Type

Description

Example

question

String

The question asked by the user.

what is OpenSearch

type

String

The type of the question.

text

content

Array

The content of the document.

["OpenSearch","Havenask"]

options

Json

The parameter options.

stream

Boolean

Specifies whether to enable HTTP chunked transfer encoding.

false

prompt

String

The prompt content.

model

String

The LLM to be used.

Valid values for the Singapore region:

  • opensearch-llama2-13b

  • opensearch-falcon-7b

  • qwen-turbo

  • qwen-plus

  • qwen-max

"qwen-turbo"

csi_level

String

Specifies whether to moderate the results that are generated by the LLM. Valid values:

  • none: does not moderate the content.

  • loose: moderates the results and blocks the results if restricted content is detected.

none

Sample request

1. Summarize based on the content of the document.

{
  "question" : "The question that is asked by the user.",
  "type" : "text",
  "content" : ["Candidate content 1","Candidate content 2"],
  "options" : {
     "stream" : false,  # Specifies whether to enable HTTP chunked transfer encoding. Default value: false.
  },
  "model": "Qwen",
  "csi_level" : "none"  # The content moderation configurations for the results that are generated by the LLM. none: does not moderate the results. loose: moderates the results and blocks the results if restricted content is detected.
}

2. Directly specify the prompt.

{
  "prompt": "The prompt content.",
  "model": "Qwen",
  "options" : {
    "stream" : false,  # Specifies whether to enable HTTP chunked transfer encoding. Default value: false.
  },
  "csi_level" : "none"  # The content moderation configurations for the results that are generated by the LLM. none: does not moderate the results. loose: moderates the results and blocks the results if restricted content is detected.
}
Note

If you specify the prompt parameter, the values of the question and content parameters do not take effect. The value of the prompt parameter is directly used as the prompt of the LLM.

Response parameters

Parameter

Type

Description

Example

request_id

String

The request ID.

abc123-ABC

latency

Float

The latency.

10.0

result

Json

The result.

data

Array

The returned dataset.

answer

String

The answer.

answer text

type

String

The type of the answer.

text

errors

Array

The errors.

code

String

The error code.

1001

message

String

The error message.

APP is not found

Sample response

{
  "request_id" : "",
  "latency" : 10.0, # Unit: seconds.
  "result" : {
    "data": [
      {
        "answer" : "answer text",
        "type" : "text"
        ]
      }
    ]
  },
  "errors" : [
    {
      "code" : "The error code that is returned if an error occurs.",
      "message" : "The error message that is returned if an error occurs."
    }
  ]
}