Sends a multi-turn conversation to a hosted model and returns generated text via an OpenAI-compatible /chat/completions endpoint.
Endpoint
{host}/compatible-mode/v1/chat/completionshost is the service endpoint address. The service is reachable over the internet or through a Virtual Private Cloud (VPC). See Query service endpoint for your address.

Quick start
The following example sends a two-message conversation and returns a single completion:
curl http://xxxx-cn-shanghai.opensearch.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "ops-qwen-turbo",
"messages": [
{"role": "system", "content": "You are a robot assistant"},
{"role": "user", "content": "Recommend 1 science fiction book"}
]
}'Replace <your-api-key> with your API key. For valid model values, see List of supported services.
Sample response:
{
"id": "fb4b3860e051ecad0b019971******",
"object": "chat.completion",
"created": 1749804786,
"model": "ops-qwen-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The 'Three-Body Problem' series by Liu Cixin. This is a story about......"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 22,
"completion_tokens": 48,
"total_tokens": 70
}
}Request parameters
messages
| Field | Type | Required | Description |
|---|---|---|---|
messages | List[Dict] | Yes | The conversation history. Each item has a role and content. |
Each message object requires two fields:
role: The speaker. Valid values:system,user,assistant.content: The message text. Cannot be empty.
Role constraints:
| Role | Position | Description |
|---|---|---|
system | Must be messages[0] if present | Sets the model's behavior for the session. Optional, but must appear first if included. |
user | Any position after system | A message from the end user. |
assistant | Any position after system | A message from the model. Use this to supply conversation history. |
user and assistant messages should alternate to simulate a real conversation turn.
Example messages array:
[
{"role": "system", "content": "You are a robot assistant"},
{"role": "user", "content": "What is the capital of Henan?"},
{"role": "assistant", "content": "Zhengzhou"},
{"role": "user", "content": "What are some fun places to visit there?"}
]model
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
model | String | Yes | The service ID that identifies which model to call. | ops-qwen-turbo |
For valid values, see List of supported services.
Generation parameters
| Parameter | Type | Required | Range | Default | Description |
|---|---|---|---|---|---|
max_tokens | Int | No | — | — | Maximum number of tokens to generate. If the model reaches this limit before finishing, finish_reason is set to length. |
temperature | Float | No | [0, 2) | — | Controls output randomness. Lower values produce more deterministic responses — suitable for factual Q&A. Higher values produce more varied responses — suitable for creative tasks. Note: 0 is meaningless. |
top_p | Float | No | (0, 1.0) | — | Nucleus sampling threshold. Lower values restrict the token selection pool and increase determinism. Higher values allow more diverse word choices. |
presence_penalty | Float | No | [-2.0, 2.0] | 0 | Penalizes tokens that have appeared anywhere in the output so far, reducing repetition of topics. |
frequency_penalty | Float | No | [-2.0, 2.0] | 0 | Penalizes tokens based on how often they appear in the output so far, reducing repetition of specific phrases. |
stop | String or List[String] | No | — | null | One or more sequences that stop generation when encountered. The stop sequence itself is not included in the output. |
stream | Boolean | No | — | false | Set to true to receive output as a stream of incremental chunks. In stream mode, the interface returns results as a generator, which must be iterated to retrieve the incremental sequences. |
Response parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
id | String | The request ID. | 2244F3A8-4201-4F37-BF86-42013B1026D6 |
object | String | Always chat.completion. | chat.completion |
created | Long | Unix timestamp (seconds) when the response was created. | 1719313883 |
model | String | The service ID used to generate the response. | ops-qwen-turbo |
choices.index | Int | Index of this result. 0 is the first result. | 0 |
choices.message | Map | The model's response message, with role and content fields. | {"role": "assistant", "content": "This is an example"} |
choices.finish_reason | String | Reason generation stopped. See Finish reasons. | stop |
usage.prompt_tokens | Int | Number of tokens in the input messages. | 180 |
usage.completion_tokens | Int | Number of tokens in the generated response. | 150 |
usage.total_tokens | Int | Total tokens used (prompt_tokens + completion_tokens). | 330 |
Finish reasons
| Value | Meaning |
|---|---|
stop | The model returned a complete response. |
length | Generation stopped because max_tokens was reached. Increase max_tokens to get longer output. |
content_filter* | The response was filtered by content safety. Values starting with content_filter indicate a safety filter result. |
Status codes
For HTTP status code details, see Status codes.