This topic describes the API used to perform conversations with a large language model (LLM). You can configure prompt parameters to specify questions or conversation content. You can also select an LLM based on your business requirements and configure settings such as whether to moderate the results that are generated by the LLM.
Prerequisites
An API key for identity authentication is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must be authenticated. For more information, see Manage API keys. LLM is short for large language model.
An endpoint is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must specify an endpoint. For more information, see Obtain endpoints.
Operation information
Request method | Request protocol | Request data format |
POST | HTTP | JSON |
Request URL
{host}/v3/openapi/apps/[app_group_identity]/actions/knowledge-llm{host}: the endpoint that is used to call the API operation. You can call the API operation over the Internet or a virtual private cloud (VPC). For more information about how to obtain an endpoint, see Obtain endpoints.{app_group_identity}: the name of the application that you want to access. You can log on to the OpenSearch LLM-Based Conversational Search Edition console and view the application name of the corresponding instance on the Instance Management page.
Request parameters
Parameter | Type | Description | Example |
question | String | The question asked by the user. | what is OpenSearch |
type | String | The type of the question. | text |
content | Array | The content of the document. | ["OpenSearch","Havenask"] |
options | Json | The parameter options. | |
stream | Boolean | Specifies whether to enable HTTP chunked transfer encoding. | false |
prompt | String | The prompt content. | |
model | String | The LLM to be used. Valid values for the Singapore region:
| "qwen-turbo" |
csi_level | String | Specifies whether to moderate the results that are generated by the LLM. Valid values:
| none |
Sample request
1. Summarize based on the content of the document.
{
"question" : "The question that is asked by the user.",
"type" : "text",
"content" : ["Candidate content 1","Candidate content 2"],
"options" : {
"stream" : false, # Specifies whether to enable HTTP chunked transfer encoding. Default value: false.
},
"model": "Qwen",
"csi_level" : "none" # The content moderation configurations for the results that are generated by the LLM. none: does not moderate the results. loose: moderates the results and blocks the results if restricted content is detected.
}2. Directly specify the prompt.
{
"prompt": "The prompt content.",
"model": "Qwen",
"options" : {
"stream" : false, # Specifies whether to enable HTTP chunked transfer encoding. Default value: false.
},
"csi_level" : "none" # The content moderation configurations for the results that are generated by the LLM. none: does not moderate the results. loose: moderates the results and blocks the results if restricted content is detected.
}If you specify the prompt parameter, the values of the question and content parameters do not take effect. The value of the prompt parameter is directly used as the prompt of the LLM.
Response parameters
Parameter | Type | Description | Example |
request_id | String | The request ID. | abc123-ABC |
latency | Float | The latency. | 10.0 |
result | Json | The result. | |
data | Array | The returned dataset. | |
answer | String | The answer. | answer text |
type | String | The type of the answer. | text |
errors | Array | The errors. | |
code | String | The error code. | 1001 |
message | String | The error message. | APP is not found |
Sample response
{
"request_id" : "",
"latency" : 10.0, # Unit: seconds.
"result" : {
"data": [
{
"answer" : "answer text",
"type" : "text"
]
}
]
},
"errors" : [
{
"code" : "The error code that is returned if an error occurs.",
"message" : "The error message that is returned if an error occurs."
}
]
}