All Products
Search
Document Center

Alibaba Cloud Model Studio:OpenAI compatible - Chat

Last Updated:Mar 02, 2026

Migrate existing OpenAI code to Alibaba Cloud Model Studio by updating three values: the API key, base_url, and model name.

base_url

SDK base_url

Region

base_url

Singapore

https://dashscope-intl.aliyuncs.com/compatible-mode/v1

Virginia

https://dashscope-us.aliyuncs.com/compatible-mode/v1

Beijing

https://dashscope.aliyuncs.com/compatible-mode/v1

HTTP endpoint

Region

Endpoint

Singapore

POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Virginia

POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions

Beijing

POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Supported models

Global

Commercial

  • Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshots

  • Qwen-Plus series: qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshots

  • Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshots

Open-source

qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b

International

Commercial

  • Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshots, qwen-max, qwen-max-latest, qwen-max-2025-01-25 and later snapshots

  • Qwen-Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshots, qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshots

  • Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshots

  • Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2024-11-01 and later snapshots

  • Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshots, qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshots

  • QwQ series: qwq-plus

Open-source

qwen3.5-397b-a17b

qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b

qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct

US

Commercial

  • Qwen-Plus series: qwen-plus-us, qwen-plus-2025-12-01-us and later snapshots

  • Qwen-Flash series: qwen-flash-us, qwen-flash-2025-07-28-us

Chinese Mainland

Commercial

  • Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshots, qwen-max, qwen-max-latest, qwen-max-2024-09-19 and later snapshots

  • Qwen-Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshots, qwen-plus, qwen-plus-latest, qwen-plus-2024-12-20 and later snapshots

  • Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshots

  • Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28 and later snapshots

  • Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshots, qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshots, qwen-coder-plus, qwen-coder-plus-latest, qwen-coder-plus-2024-11-06, qwen-coder-turbo, qwen-coder-turbo-latest, qwen-coder-turbo-2024-09-19

  • QwQ series: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05

  • Qwen-Math: qwen-math-plus, qwen-math-plus-latest, qwen-math-plus-2024-08-16 and later snapshots, qwen-math-turbo, qwen-math-turbo-latest, qwen-math-turbo-2024-09-19

Open-source

qwen3.5-397b-a17b

qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b

qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct, qwen2.5-3b-instruct, qwen2.5-1.5b-instruct, qwen2.5-0.5b-instruct

OpenAI SDK

Prerequisites

  • Python environment

  • Latest OpenAI SDK

        # If the following command reports an error, replace pip with pip3.
        pip install -U openai
  • Active Model Studio account with an API key: Get an API key.

  • API key exported as an environment variable: Export the API key as an environment variable.

    Setting the API key directly in code increases the risk of leakage.
  • Model selected from the list above: Supported models.

Non-streaming

from openai import OpenAI
import os

def get_response():
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # Replace with api_key="sk-xxx" if you have not set an environment variable.
        # base_url for the Singapore region.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}]
        )
    print(completion.model_dump_json())

if __name__ == '__main__':
    get_response()

Sample output:

{
    "id": "chatcmpl-xxx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "I am a large-scale pre-trained model from Alibaba Cloud. My name is Qwen.",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1716430652,
    "model": "qwen-plus",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 18,
        "prompt_tokens": 22,
        "total_tokens": 40
    }
}

Streaming

from openai import OpenAI
import os


def get_response():
    client = OpenAI(
        # Replace with api_key="sk-xxx" if you have not set an environment variable.
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # base_url for the Singapore region.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}],
        stream=True,
        # Include token usage in the last chunk of the streaming output.
        stream_options={"include_usage": True}
        )
    for chunk in completion:
        print(chunk.model_dump_json())


if __name__ == '__main__':
    get_response()

Sample output:

{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am a","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" large language model from","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" Alibaba","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" Cloud.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" My name is Qwen.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}

Tool calling example

This example demonstrates tool calling with weather and time query tools. The code supports multi-round tool calling.

from openai import OpenAI
from datetime import datetime
import json
import os

client = OpenAI(
    # Replace with api_key="sk-xxx" if you have not set an environment variable.
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # base_url for the Singapore region.
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# Define tools. The model selects a tool based on its name and description.
tools = [
    # Tool 1: Get the current time.
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Useful for when you want to know the current time.",
            # No input parameters required, so parameters is an empty dictionary.
            "parameters": {}
        }
    },
    # Tool 2: Get weather for a specified city.
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Useful for querying the weather in a specific city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "A city or county, such as Beijing, Hangzhou, or Yuhang District."
                    }
                }
            },
            "required": [
                "location"
            ]
        }
    }
]

# Simulated weather tool. Returns a sample result like "It is rainy in Beijing today."
def get_current_weather(location):
    return f"It is rainy in {location} today. "

# Time tool. Returns a result like "Current time: 2024-04-15 17:15:18."
def get_current_time():
    current_datetime = datetime.now()
    formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
    return f"Current time: {formatted_time}."

# Send a request and return the model response.
def get_response(messages):
    completion = client.chat.completions.create(
        model="qwen-plus",  # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=messages,
        tools=tools
        )
    return completion.model_dump()

def call_with_messages():
    print('\n')
    messages = [
            {
                "content": input('Please enter your query:'),  # Example: "What time is it?" "What is the weather in Beijing?"
                "role": "user"
            }
    ]
    print("-"*60)
    # First round of model calling.
    i = 1
    first_response = get_response(messages)
    assistant_output = first_response['choices'][0]['message']
    print(f"\nModel output in round {i}: {first_response}\n")
    if  assistant_output['content'] is None:
        assistant_output['content'] = ""
    messages.append(assistant_output)
    # If no tool call is needed, return the answer directly.
    if assistant_output['tool_calls'] == None:
        print(f"No tool call is needed. I can answer directly: {assistant_output['content']}")
        return
    # If a tool call is needed, loop until the model stops calling tools.
    while assistant_output['tool_calls'] != None:
        if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
            tool_info = {"name": "get_current_weather", "role":"tool"}
            location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
            tool_info['content'] = get_current_weather(location)
        elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
            tool_info = {"name": "get_current_time", "role":"tool"}
            tool_info['content'] = get_current_time()
        print(f"Tool output: {tool_info['content']}\n")
        print("-"*60)
        messages.append(tool_info)
        assistant_output = get_response(messages)['choices'][0]['message']
        if  assistant_output['content'] is None:
            assistant_output['content'] = ""
        messages.append(assistant_output)
        i += 1
        print(f"Model output in round {i}: {assistant_output}\n")
    print(f"Final answer: {assistant_output['content']}")

if __name__ == '__main__':
    call_with_messages()

If you enter How is the weather in Singapore? What time is it now?, the program produces the following output:

Tool calling example output

LangChain OpenAI SDK

Prerequisites

  • Python environment

  • langchain_openai SDK

        # If the following command reports an error, replace pip with pip3.
        pip install -U langchain_openai
  • Active Model Studio account with an API key: Get an API key.

  • API key exported as an environment variable: Export the API key as an environment variable.

    Setting the API key directly in code increases the risk of leakage.
  • Model selected from the list above: Supported models.

Non-streaming

Use the invoke method for non-streaming output.

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # Replace with api_key="sk-xxx" if you have not set an environment variable.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  # base_url for the Singapore region.
        model="qwen-plus"  # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."},
        {"role":"user","content":"Who are you?"}
    ]
    response = llm.invoke(messages)
    print(response.json())

if __name__ == "__main__":
    get_response()

Sample output:

{
    "content": "I am a large language model from Alibaba Cloud. My name is Qwen.",
    "additional_kwargs": {},
    "response_metadata": {
        "token_usage": {
            "completion_tokens": 16,
            "prompt_tokens": 22,
            "total_tokens": 38
        },
        "model_name": "qwen-plus",
        "system_fingerprint": "",
        "finish_reason": "stop",
        "logprobs": null
    },
    "type": "ai",
    "name": null,
    "id": "run-xxx",
    "example": false,
    "tool_calls": [],
    "invalid_tool_calls": []
}

Streaming

Use the stream method for streaming output. The stream parameter does not need to be set manually.

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # Replace with api_key="sk-xxx" if you have not set an environment variable.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  # base_url for the Singapore region.
        model="qwen-plus",  # Replace the model name as needed. Model list: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        stream_usage=True
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."},
        {"role":"user","content":"Who are you?"},
    ]
    response = llm.stream(messages)
    for chunk in response:
        print(chunk.model_dump_json())

if __name__ == "__main__":
    get_response()

Sample output:

{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " a large", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " from", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Alibaba", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Cloud", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ", and my name is Qwen.", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}

For parameter details, see Request parameters. Define parameters in the ChatOpenAI object.

HTTP API

Prerequisites

Endpoint

Region

Endpoint

Singapore

POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Virginia

POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions

Beijing

POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Non-streaming

If you have not configured the API key as an environment variable, replace $DASHSCOPE_API_KEY with your API key.
# base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Who are you?"
        }
    ]
}'

Sample output:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "I am a large language model from Alibaba Cloud. My name is Qwen."
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 11,
        "completion_tokens": 16,
        "total_tokens": 27
    },
    "created": 1715252778,
    "system_fingerprint": "",
    "model": "qwen-plus",
    "id": "chatcmpl-xxx"
}

Streaming

Set "stream": true in the request body to enable streaming.

# base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

Sample output:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" a large-scale"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" from Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":", and my name is Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

Error response

If a request fails, the response includes code and message fields that indicate the cause.

{
    "error": {
        "message": "Incorrect API key provided. ",
        "type": "invalid_request_error",
        "param": null,
        "code": "invalid_api_key"
    }
}

Request parameters

The following parameters align with the OpenAI interface.

Parameter

Type

Default

Description

model

string

-

Required. The model to use. See Supported models.

messages

array

-

Required. Conversation history. Each element has the format {"role": "Role", "content": "Content"}. Valid roles: system, user, assistant. The system role can only appear in messages[0]. Typically, user and assistant alternate, and the last element must have the user role.

top_p

float

-

Optional. Nucleus sampling probability threshold. For example, 0.8 means the model samples from the smallest set of tokens whose cumulative probability reaches 0.8 or higher. Valid range: (0, 1.0). Higher values increase randomness.

temperature

float

-

Optional. Controls output randomness. Higher values produce more diverse results; lower values produce more deterministic results. Valid range: [0, 2). Do not set this to 0.

presence_penalty

float

-

Optional. Reduces repetition in generated content. Higher values reduce repetition more. Valid range: [-2.0, 2.0].

Note

Supported only on Qwen commercial and open-source models of qwen1.5 and later.

n

integer

1

Optional. Number of responses to generate. Valid range: 1 to 4. Useful for creative tasks such as ad copy. Does not increase input token consumption but increases output token consumption.

Important

Supported only for the qwen-plus model. Must be 1 when the tools parameter is set.

max_tokens

integer

-

Optional. Maximum number of tokens the model can generate. Each model has its own output limit. See the model list for details.

seed

integer

-

Optional. Random number seed for generation. Supports unsigned 64-bit integers.

stream

boolean

False

Optional. Enables streaming output. When enabled, the API returns a generator. Iterate over it to receive incremental results.

stop

string or array

None

Optional. Stops generation when the model is about to output a specified string or token. Accepts a string or an array. When using a string, the model stops before generating that string. When using an array, elements can be token IDs, strings, or arrays of token IDs.

Note

Do not mix token IDs and strings in the same array. For example, ["Hello", 104307] is not valid.

tools

array

None

Optional. Defines tools the model can call. Each tool has a type field (currently only "function") and a function object with name, description, and parameters. The name field allows only letters, digits, underscores, and hyphens (max 64 characters). The parameters field must be valid JSON Schema. Supported models: qwen-turbo, qwen-plus, and qwen-max. Cannot use with stream=True.

stream_options

object

None

Optional. Takes effect only when stream is True. Set to {"include_usage": true} to include token usage in the streaming output.

Response parameters

Parameter

Type

Description

id

string

System-generated ID for the call.

model

string

The model used.

system_fingerprint

string

Configuration version of the model runtime. Not currently supported; returns null.

choices

array

Content generated by the model.

choices[i].finish_reason

string

null during generation, stop when a stop condition is triggered, length when the maximum output length is reached.

choices[i].message

object

The model output message.

choices[i].message.role

string

Fixed to assistant.

choices[i].message.content

string

Generated text.

choices[i].index

integer

Sequence number of the result. Default: 0.

created

integer

UNIX timestamp (seconds) of when the result was generated.

usage

object

Token consumption for the request.

usage.prompt_tokens

integer

Number of tokens in the input.

usage.completion_tokens

integer

Number of tokens in the generated response.

usage.total_tokens

integer

Sum of prompt_tokens and completion_tokens.

Error codes

Code

Description

400 - Invalid Request Error

Invalid Request Error. See the error message for details.

401 - Incorrect API key provided

Incorrect API key provided.

429 - Rate limit reached for requests

Rate limit reached. The queries per second (QPS), queries per minute (QPM), or other rate limits are exceeded.

429 - Rate limit reached for requests

Quota exceeded or account has an overdue payment.

500 - You exceeded your current quota, please check your plan and billing details

A server-side error occurred.

503 - The engine is currently overloaded, please try again later

The server is overloaded. Try again later.