This document details the API parameter configuration for the OpenAI compatible large model response service.
Request parameters
Parameter | Type | Required | Description | Example value |
messages | List[Dict] | Yes | A list of messages in the conversation so far,
| [ {"role": "system", "content": "You are a robot assistant"}, {"role": "user", "content": "What is the capital of Henan?"}, {"role": "assistant", "content": "Zhengzhou"}, {"role": "user", "content": "What are some fun places to visit there?"} ] |
model | String | Yes | The service ID. For a list of supported service IDs, refer to | ops-qwen-turbo |
max_tokens | Int | No | The maximum number of tokens to generate upon completion of the chat. If this limit is reached and the conversation is not concluded, the 'finish_reason' will be 'length'; otherwise, it will be 'stop'. | 1024 |
temperature | Float | No | The temperature value controls the probability distribution of each candidate word when generating text, used to control the randomness and diversity of the model's responses. The range is [0, 2), with a value of 0 being meaningless. Higher temperature values will lower the peak of the probability distribution, allowing more low-probability words to be selected, resulting in more diverse outputs; lower temperature values will enhance the peak of the probability distribution, making high-probability words more likely to be selected, resulting in more deterministic outputs. | 1 |
top_p | Float | No | The 'top_p' parameter sets the probability threshold for nucleus sampling during generation, with a range of (0, 1.0). A larger value increases the randomness of generation, while a smaller value enhances determinism. | 0.8 |
presence_penalty | Float | No | Controls the repetition of the entire sequence when the model generates text. The range is [-2.0, 2.0], with a default value of 0. Increasing the presence_penalty can reduce the repetition of the model's output. | 0 |
frequency_penalty | Float | No | Frequency penalty value. The range is [-2.0, 2.0], with a default value of 0. Positive values will penalize new words based on their current frequency in the text, reducing the likelihood of the model repeating the same phrases. | 0 |
stop | String, List[String] | No | Designated 'stop' words or tokens prompt the model to cease generating content when such content is imminent. The generated content will exclude the specified 'stop' elements, which can be a single string or an array of strings. The default is null. | Default null |
stream | Boolean | No | The 'stream' parameter determines whether to use streaming output. In stream mode, the interface returns results as a generator, which must be iterated to retrieve the incremental sequences. The default is false. | false |
Response parameters
Parameter | Type | Description | Example value |
id | String | The system-generated unique identifier for the call. | 2244F3A8-4201-4F37-BF86-42013B1026D6 |
object | String | The type of object, consistently set as 'chat.completion'. | chat.completion |
created | Long | The Unix timestamp indicating when the response was created, measured in seconds. | 1719313883 |
model | String | The model used for generating the response. | ops-qwen-turbo |
choices.index | Int | The index of the generated result, with 0 indicating the first result. | 0 |
choices.message | Map | The content of the message generated by the model. | { "role":"assistant", "content":"This is an example" } |
choices.finish_reason | String | Segmented + Streaming
| stop |
usage.completion_tokens | Int | The number of tokens used by the model to generate the response. | 150 |
usage.prompt_tokens | Int | The number of tokens representing the user's input to the model. | 180 |
usage.total_tokens | Int | The sum of tokens used for both the user's input and the model's response. | 330 |
Status codes
For detailed information on status codes, see the status codes documentation.