Migrate existing OpenAI code to Alibaba Cloud Model Studio by updating three values: the API key, base_url, and model name.
base_url
SDK base_url
|
Region |
base_url |
|
Singapore |
|
|
Virginia |
|
|
Beijing |
|
HTTP endpoint
|
Region |
Endpoint |
|
Singapore |
|
|
Virginia |
|
|
Beijing |
|
Supported models
Global
Commercial
-
Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshots
-
Qwen-Plus series: qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshots
-
Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshots
Open-source
qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b
International
Commercial
-
Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshots, qwen-max, qwen-max-latest, qwen-max-2025-01-25 and later snapshots
-
Qwen-Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshots, qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshots
-
Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshots
-
Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2024-11-01 and later snapshots
-
Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshots, qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshots
-
QwQ series: qwq-plus
Open-source
qwen3.5-397b-a17b
qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b
qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct
US
Commercial
-
Qwen-Plus series: qwen-plus-us, qwen-plus-2025-12-01-us and later snapshots
-
Qwen-Flash series: qwen-flash-us, qwen-flash-2025-07-28-us
Chinese Mainland
Commercial
-
Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshots, qwen-max, qwen-max-latest, qwen-max-2024-09-19 and later snapshots
-
Qwen-Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshots, qwen-plus, qwen-plus-latest, qwen-plus-2024-12-20 and later snapshots
-
Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshots
-
Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28 and later snapshots
-
Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshots, qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshots, qwen-coder-plus, qwen-coder-plus-latest, qwen-coder-plus-2024-11-06, qwen-coder-turbo, qwen-coder-turbo-latest, qwen-coder-turbo-2024-09-19
-
QwQ series: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05
-
Qwen-Math: qwen-math-plus, qwen-math-plus-latest, qwen-math-plus-2024-08-16 and later snapshots, qwen-math-turbo, qwen-math-turbo-latest, qwen-math-turbo-2024-09-19
Open-source
qwen3.5-397b-a17b
qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b
qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct, qwen2.5-3b-instruct, qwen2.5-1.5b-instruct, qwen2.5-0.5b-instruct
OpenAI SDK
Prerequisites
-
Python environment
-
Latest OpenAI SDK
# If the following command reports an error, replace pip with pip3. pip install -U openai -
Active Model Studio account with an API key: Get an API key.
-
API key exported as an environment variable: Export the API key as an environment variable.
Setting the API key directly in code increases the risk of leakage.
-
Model selected from the list above: Supported models.
Non-streaming
from openai import OpenAI
import os
def get_response():
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # Replace with api_key="sk-xxx" if you have not set an environment variable.
# base_url for the Singapore region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-plus", # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you?'}]
)
print(completion.model_dump_json())
if __name__ == '__main__':
get_response()
Sample output:
{
"id": "chatcmpl-xxx",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "I am a large-scale pre-trained model from Alibaba Cloud. My name is Qwen.",
"role": "assistant",
"function_call": null,
"tool_calls": null
}
}
],
"created": 1716430652,
"model": "qwen-plus",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 18,
"prompt_tokens": 22,
"total_tokens": 40
}
}
Streaming
from openai import OpenAI
import os
def get_response():
client = OpenAI(
# Replace with api_key="sk-xxx" if you have not set an environment variable.
api_key=os.getenv("DASHSCOPE_API_KEY"),
# base_url for the Singapore region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-plus", # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you?'}],
stream=True,
# Include token usage in the last chunk of the streaming output.
stream_options={"include_usage": True}
)
for chunk in completion:
print(chunk.model_dump_json())
if __name__ == '__main__':
get_response()
Sample output:
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am a","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" large language model from","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" Alibaba","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" Cloud.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" My name is Qwen.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}
Tool calling example
This example demonstrates tool calling with weather and time query tools. The code supports multi-round tool calling.
from openai import OpenAI
from datetime import datetime
import json
import os
client = OpenAI(
# Replace with api_key="sk-xxx" if you have not set an environment variable.
api_key=os.getenv("DASHSCOPE_API_KEY"),
# base_url for the Singapore region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Define tools. The model selects a tool based on its name and description.
tools = [
# Tool 1: Get the current time.
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "Useful for when you want to know the current time.",
# No input parameters required, so parameters is an empty dictionary.
"parameters": {}
}
},
# Tool 2: Get weather for a specified city.
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Useful for querying the weather in a specific city.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "A city or county, such as Beijing, Hangzhou, or Yuhang District."
}
}
},
"required": [
"location"
]
}
}
]
# Simulated weather tool. Returns a sample result like "It is rainy in Beijing today."
def get_current_weather(location):
return f"It is rainy in {location} today. "
# Time tool. Returns a result like "Current time: 2024-04-15 17:15:18."
def get_current_time():
current_datetime = datetime.now()
formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
return f"Current time: {formatted_time}."
# Send a request and return the model response.
def get_response(messages):
completion = client.chat.completions.create(
model="qwen-plus", # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=messages,
tools=tools
)
return completion.model_dump()
def call_with_messages():
print('\n')
messages = [
{
"content": input('Please enter your query:'), # Example: "What time is it?" "What is the weather in Beijing?"
"role": "user"
}
]
print("-"*60)
# First round of model calling.
i = 1
first_response = get_response(messages)
assistant_output = first_response['choices'][0]['message']
print(f"\nModel output in round {i}: {first_response}\n")
if assistant_output['content'] is None:
assistant_output['content'] = ""
messages.append(assistant_output)
# If no tool call is needed, return the answer directly.
if assistant_output['tool_calls'] == None:
print(f"No tool call is needed. I can answer directly: {assistant_output['content']}")
return
# If a tool call is needed, loop until the model stops calling tools.
while assistant_output['tool_calls'] != None:
if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
tool_info = {"name": "get_current_weather", "role":"tool"}
location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
tool_info['content'] = get_current_weather(location)
elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
tool_info = {"name": "get_current_time", "role":"tool"}
tool_info['content'] = get_current_time()
print(f"Tool output: {tool_info['content']}\n")
print("-"*60)
messages.append(tool_info)
assistant_output = get_response(messages)['choices'][0]['message']
if assistant_output['content'] is None:
assistant_output['content'] = ""
messages.append(assistant_output)
i += 1
print(f"Model output in round {i}: {assistant_output}\n")
print(f"Final answer: {assistant_output['content']}")
if __name__ == '__main__':
call_with_messages()
If you enter How is the weather in Singapore? What time is it now?, the program produces the following output:

LangChain OpenAI SDK
Prerequisites
-
Python environment
-
langchain_openai SDK
# If the following command reports an error, replace pip with pip3. pip install -U langchain_openai -
Active Model Studio account with an API key: Get an API key.
-
API key exported as an environment variable: Export the API key as an environment variable.
Setting the API key directly in code increases the risk of leakage.
-
Model selected from the list above: Supported models.
Non-streaming
Use the invoke method for non-streaming output.
from langchain_openai import ChatOpenAI
import os
def get_response():
llm = ChatOpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # Replace with api_key="sk-xxx" if you have not set an environment variable.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # base_url for the Singapore region.
model="qwen-plus" # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
)
messages = [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Who are you?"}
]
response = llm.invoke(messages)
print(response.json())
if __name__ == "__main__":
get_response()
Sample output:
{
"content": "I am a large language model from Alibaba Cloud. My name is Qwen.",
"additional_kwargs": {},
"response_metadata": {
"token_usage": {
"completion_tokens": 16,
"prompt_tokens": 22,
"total_tokens": 38
},
"model_name": "qwen-plus",
"system_fingerprint": "",
"finish_reason": "stop",
"logprobs": null
},
"type": "ai",
"name": null,
"id": "run-xxx",
"example": false,
"tool_calls": [],
"invalid_tool_calls": []
}
Streaming
Use the stream method for streaming output. The stream parameter does not need to be set manually.
from langchain_openai import ChatOpenAI
import os
def get_response():
llm = ChatOpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # Replace with api_key="sk-xxx" if you have not set an environment variable.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # base_url for the Singapore region.
model="qwen-plus", # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
stream_usage=True
)
messages = [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Who are you?"},
]
response = llm.stream(messages)
for chunk in response:
print(chunk.model_dump_json())
if __name__ == "__main__":
get_response()
Sample output:
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " a large", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " from", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Alibaba", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Cloud", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ", and my name is Qwen.", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}
For parameter details, see Request parameters. Define parameters in the ChatOpenAI object.
HTTP API
Prerequisites
-
Active Model Studio account with an API key: Get an API key.
-
API key exported as an environment variable: Export the API key as an environment variable.
Setting the API key directly in code increases the risk of leakage.
Endpoint
|
Region |
Endpoint |
|
Singapore |
|
|
Virginia |
|
|
Beijing |
|
Non-streaming
If you have not configured the API key as an environment variable, replace $DASHSCOPE_API_KEY with your API key.# base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}'
Sample output:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "I am a large language model from Alibaba Cloud. My name is Qwen."
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 11,
"completion_tokens": 16,
"total_tokens": 27
},
"created": 1715252778,
"system_fingerprint": "",
"model": "qwen-plus",
"id": "chatcmpl-xxx"
}
Streaming
Set "stream": true in the request body to enable streaming.
# base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true
}'
Sample output:
data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":" a large-scale"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":" language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":" from Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":", and my name is Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: [DONE]
Error response
If a request fails, the response includes code and message fields that indicate the cause.
{
"error": {
"message": "Incorrect API key provided. ",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
Request parameters
The following parameters align with the OpenAI interface.
|
Parameter |
Type |
Default |
Description |
|
model |
string |
- |
Required. The model to use. See Supported models. |
|
messages |
array |
- |
Required. Conversation history. Each element has the format |
|
top_p |
float |
- |
Optional. Nucleus sampling probability threshold. For example, 0.8 means the model samples from the smallest set of tokens whose cumulative probability reaches 0.8 or higher. Valid range: (0, 1.0). Higher values increase randomness. |
|
temperature |
float |
- |
Optional. Controls output randomness. Higher values produce more diverse results; lower values produce more deterministic results. Valid range: [0, 2). Do not set this to 0. |
|
presence_penalty |
float |
- |
Optional. Reduces repetition in generated content. Higher values reduce repetition more. Valid range: [-2.0, 2.0]. Note
Supported only on Qwen commercial and open-source models of qwen1.5 and later. |
|
n |
integer |
1 |
Optional. Number of responses to generate. Valid range: 1 to 4. Useful for creative tasks such as ad copy. Does not increase input token consumption but increases output token consumption. Important
Supported only for the qwen-plus model. Must be 1 when the |
|
max_tokens |
integer |
- |
Optional. Maximum number of tokens the model can generate. Each model has its own output limit. See the model list for details. |
|
seed |
integer |
- |
Optional. Random number seed for generation. Supports unsigned 64-bit integers. |
|
stream |
boolean |
False |
Optional. Enables streaming output. When enabled, the API returns a generator. Iterate over it to receive incremental results. |
|
stop |
string or array |
None |
Optional. Stops generation when the model is about to output a specified string or token. Accepts a string or an array. When using a string, the model stops before generating that string. When using an array, elements can be token IDs, strings, or arrays of token IDs. Note
Do not mix token IDs and strings in the same array. For example, |
|
tools |
array |
None |
Optional. Defines tools the model can call. Each tool has a |
|
stream_options |
object |
None |
Optional. Takes effect only when |
Response parameters
|
Parameter |
Type |
Description |
|
id |
string |
System-generated ID for the call. |
|
model |
string |
The model used. |
|
system_fingerprint |
string |
Configuration version of the model runtime. Not currently supported; returns |
|
choices |
array |
Content generated by the model. |
|
choices[i].finish_reason |
string |
|
|
choices[i].message |
object |
The model output message. |
|
choices[i].message.role |
string |
Fixed to |
|
choices[i].message.content |
string |
Generated text. |
|
choices[i].index |
integer |
Sequence number of the result. Default: 0. |
|
created |
integer |
UNIX timestamp (seconds) of when the result was generated. |
|
usage |
object |
Token consumption for the request. |
|
usage.prompt_tokens |
integer |
Number of tokens in the input. |
|
usage.completion_tokens |
integer |
Number of tokens in the generated response. |
|
usage.total_tokens |
integer |
Sum of |
Error codes
|
Code |
Description |
|
400 - Invalid Request Error |
Invalid Request Error. See the error message for details. |
|
401 - Incorrect API key provided |
Incorrect API key provided. |
|
429 - Rate limit reached for requests |
Rate limit reached. The queries per second (QPS), queries per minute (QPM), or other rate limits are exceeded. |
|
429 - Rate limit reached for requests |
Quota exceeded or account has an overdue payment. |
|
500 - You exceeded your current quota, please check your plan and billing details |
A server-side error occurred. |
|
503 - The engine is currently overloaded, please try again later |
The server is overloaded. Try again later. |