All Products
Search
Document Center

OpenSearch:Large model responses

Last Updated:Sep 02, 2024

This document details the API parameter configuration for the OpenAI compatible large model response service.

Request parameters

Parameter

Type

Required

Description

Example value

messages

List[Dict]

Yes

A list of messages in the conversation so far,

  • role is the role, optional values are system, user, assistant.

    • system: Indicates a system-level message, which can only be used for the first message in the conversation history (messages[0]). Using the system role is optional, but if present, it must be at the beginning of the list.

    • user and assistant: Represent the conversation between the user and the model. These two roles should alternate in the conversation to simulate an actual conversation flow.

  • content is the dialog information and cannot be empty.

[

{"role": "system", "content": "You are a robot assistant"},

{"role": "user", "content": "What is the capital of Henan?"},

{"role": "assistant", "content": "Zhengzhou"},

{"role": "user", "content": "What are some fun places to visit there?"}

]

model

String

Yes

The service ID. For a list of supported service IDs, refer to

the supported services documentation.

ops-qwen-turbo

max_tokens

Int

No

The maximum number of tokens to generate upon completion of the chat. If this limit is reached and the conversation is not concluded, the 'finish_reason' will be 'length'; otherwise, it will be 'stop'.

1024

temperature

Float

No

The temperature value controls the probability distribution of each candidate word when generating text, used to control the randomness and diversity of the model's responses. The range is [0, 2), with a value of 0 being meaningless.

Higher temperature values will lower the peak of the probability distribution, allowing more low-probability words to be selected, resulting in more diverse outputs; lower temperature values will enhance the peak of the probability distribution, making high-probability words more likely to be selected, resulting in more deterministic outputs.

1

top_p

Float

No

The 'top_p' parameter sets the probability threshold for nucleus sampling during generation, with a range of (0, 1.0). A larger value increases the randomness of generation, while a smaller value enhances determinism.

0.8

presence_penalty

Float

No

Controls the repetition of the entire sequence when the model generates text. The range is [-2.0, 2.0], with a default value of 0.

Increasing the presence_penalty can reduce the repetition of the model's output.

0

frequency_penalty

Float

No

Frequency penalty value. The range is [-2.0, 2.0], with a default value of 0.

Positive values will penalize new words based on their current frequency in the text, reducing the likelihood of the model repeating the same phrases.

0

stop

String, List[String]

No

Designated 'stop' words or tokens prompt the model to cease generating content when such content is imminent. The generated content will exclude the specified 'stop' elements, which can be a single string or an array of strings. The default is null.

Default null

stream

Boolean

No

The 'stream' parameter determines whether to use streaming output. In stream mode, the interface returns results as a generator, which must be iterated to retrieve the incremental sequences. The default is false.

false

Response parameters

Parameter

Type

Description

Example value

id

String

The system-generated unique identifier for the call.

2244F3A8-4201-4F37-BF86-42013B1026D6

object

String

The type of object, consistently set as 'chat.completion'.

chat.completion

created

Long

The Unix timestamp indicating when the response was created, measured in seconds.

1719313883

model

String

The model used for generating the response.

ops-qwen-turbo

choices.index

Int

The index of the generated result, with 0 indicating the first result.

0

choices.message

Map

The content of the message generated by the model.

{

"role":"assistant",

"content":"This is an example"

}

choices.finish_reason

String

Segmented + Streaming

  • stop: Indicates that the model has returned a complete output.

  • length: Stopped generating content due to excessive length. To increase the length of the generated content, adjust the max_tokens value in the input parameters.

  • Starting with content_filter indicates the result of safety filtering.

stop

usage.completion_tokens

Int

The number of tokens used by the model to generate the response.

150

usage.prompt_tokens

Int

The number of tokens representing the user's input to the model.

180

usage.total_tokens

Int

The sum of tokens used for both the user's input and the model's response.

330

Status codes

For detailed information on status codes, see the status codes documentation.