OpenAI compatible LLM response - OpenSearch - Alibaba Cloud Documentation Center

This document describes the API parameters for the OpenAI compatible content generation service.

URL

{host}/compatible-mode/v1/chat/completions

host: The address for calling the service. You can call the API service through either the internet or VPC. For more information, see Query service endpoint.

AI apikey截图.png

Request parameters

Parameter	Type	Required	Description	Example value
messages	List[Dict]	Yes	A list of messages in the conversation so far: role, valid values are system, user, assistant. system: Indicates a system message, which can only be used for the first message in the conversation history (messages[0]). The system role is optional, but if present, it must be at the beginning of the list. user and assistant: Represent the conversation between the user and the model. These two roles should alternate in the conversation to simulate an actual conversation flow. content is the dialog information and cannot be empty.	[ {"role": "system", "content": "You are a robot assistant"}, {"role": "user", "content": "What is the capital of Henan?"}, {"role": "assistant", "content": "Zhengzhou"}, {"role": "user", "content": "What are some fun places to visit there?"} ]
model	String	Yes	The service ID. For a list of supported service IDs, refer to List of supported services.	ops-qwen-turbo
max_tokens	Int	No	The maximum number of tokens to generate upon completion of the chat. If this limit is reached and the conversation is not concluded, the 'finish_reason' will be 'length'; otherwise, it will be 'stop'.	1024
temperature	Float	No	Controls the probability distribution of each candidate word when generating text, used to control the randomness and diversity of the model's responses. The range is [0, 2). 0 is meaningless. Higher temperature values will lower the peak of the probability distribution, allowing more low-probability words to be selected, resulting in more diverse outputs; lower temperature values will enhance the peak of the probability distribution, making high-probability words more likely to be selected, resulting in more deterministic outputs.	1
top_p	Float	No	The probability threshold for nucleus sampling during generation, with a range of (0, 1.0). A larger value increases the randomness of generation, while a smaller value enhances determinism.	0.8
presence_penalty	Float	No	Controls the repetition of the entire sequence when the model generates text. The range is [-2.0, 2.0], with a default value of 0. Increasing the presence_penalty can reduce the repetition of the model's output.	0
frequency_penalty	Float	No	Frequency penalty value. The range is [-2.0, 2.0], with a default value of 0. Positive values will penalize new words based on their current frequency in the text, reducing the likelihood of the model repeating the same phrases.	0
stop	String, List[String]	No	Designated 'stop' words or tokens prompt the model to cease generating content when such content is imminent. The generated content will exclude the specified 'stop' elements, which can be a single string or an array of strings. The default is null.	Default null
stream	Boolean	No	Determines whether to use streaming output. In stream mode, the interface returns results as a generator, which must be iterated to retrieve the incremental sequences. The default is false.	false

Response parameters

Parameter	Type	Description	Example value
id	String	The system-generated unique ID for the call.	2244F3A8-4201-4F37-BF86-42013B1026D6
object	String	The type of object, consistently set as 'chat.completion'.	chat.completion
created	Long	The Unix timestamp indicating when the response was created, measured in seconds.	1719313883
model	String	The model used for generating the response.	ops-qwen-turbo
choices.index	Int	The index of the generated result, with 0 indicating the first result.	0
choices.message	Map	The content of the message generated by the model.	{ "role":"assistant", "content":"This is an example" }
choices.finish_reason	String	Segmented + Streaming stop: Indicates that the model has returned a complete output. length: Stopped generating content due to excessive length. To increase the length of the generated content, adjust the max_tokens value in the input parameters. Starting with content_filter indicates the result of safety filtering.	stop
usage.completion_tokens	Int	The number of tokens used by the model to generate the response.	150
usage.prompt_tokens	Int	The number of tokens representing the user's input to the model.	180
usage.total_tokens	Int	The sum of tokens used for both the user's input and the model's response.	330

Curl request example

curl http://xxxx-cn-shanghai.opensearch.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer Your-API-Key" \
  -d '{
        "model":"ops-qwen-turbo",
        "messages":[
            {"role": "system", "content": "You are a robot assistant"},
            {"role": "user", "content": "Recommend 1 science fiction book"}
         ]
  }'

Sample response

{
  "id":"fb4b3860e051ecad0b019971******",
  "object":"chat.completion",
  "created":1749804786,
  "model":"ops-qwen-turbo",
  "choices":
      [
         {
            "index":0,
            "message":
                {
                  "role":"assistant",
                  "content":"The 'Three-Body Problem' series by Liu Cixin. This is a story about......"
                 },
                  "finish_reason":"stop"
           }
        ],
     "usage":
         {
             "prompt_tokens":22,
             "completion_tokens":48,
             "total_tokens":70
           }
  }

Status codes

For detailed information on status codes, see Status codes.