Call Qwen models via OpenAI API - Alibaba Cloud Model Studio

Model Studio provides an OpenAI-compatible interface for Qwen models. To migrate from OpenAI, update your API key, BASE_URL, and model name.

OpenAI compatibility

BASE_URL

Configure the BASE_URL to connect to Model Studio through the OpenAI-compatible interface. The BASE_URL is the network endpoint for the model service.

When using the OpenAI SDK or other OpenAI-compatible SDKs, configure the BASE_URL as follows:

Singapore: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
Japan (Tokyo): https://{WorkspaceId}.ap-northeast-1.maas.aliyuncs.com/compatible-mode/v1
US (Virginia): https://dashscope-us.aliyuncs.com/compatible-mode/v1
China (Beijing): https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
China (Hong Kong): https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/compatible-mode/v1

When making HTTP requests, configure the full endpoint as follows:

Singapore: POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
Japan (Tokyo): POST https://{WorkspaceId}.ap-northeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
US (Virginia): POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
China (Beijing): POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
China (Hong Kong): POST https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/compatible-mode/v1/chat/completions

Important

Model Studio has released workspace-specific domains for the China (Beijing), Singapore, and China (Hong Kong) regions. The new dedicated domains deliver superior performance and higher stability for inference requests. We recommend migrating to the new domains:

China (Beijing): from https://dashscope.aliyuncs.com to https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com
Singapore: from https://dashscope-intl.aliyuncs.com to https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com
China (Hong Kong): from https://cn-hongkong.dashscope.aliyuncs.com to https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com

{WorkspaceId} is your workspace ID, which can be found on the Workspace Details page in the Model Studio console. The existing domain remains fully functional.

Supported models

Supported models include Qwen large language models (commercial and open source), Qwen-VL, Qwen-Coder, Qwen-Omni, and Qwen-Math, DeepSeek, Kimi, GLM, MiniMax.

Call Qwen models via the OpenAI SDK

Prerequisites

Install Python.

Install the latest OpenAI SDK.

# If the following command fails, replace pip with pip3.
pip install -U openai

Activate Model Studio and get an API key. See Obtain an API key.

Configure the API key as an environment variable to reduce exposure risk. Configure the API key as an environment variable. You can also set it directly in code, but this increases exposure risk.
Select a model from the List of supported models.

Usage

Non-streaming call example

from openai import OpenAI
import os

def get_response():
    client = OpenAI(
        # API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured an environment variable, replace this line with your Model Studio API key: api_key="sk-xxx"
        # The following is the base_url for the Singapore region.
        base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",  
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}]
        )
    print(completion.model_dump_json())

if __name__ == '__main__':
    get_response()

Output:

{
    "id": "chatcmpl-xxx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "I am a large-scale pre-trained model from Alibaba Cloud. My name is Qwen.",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1716430652,
    "model": "qwen-plus",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 18,
        "prompt_tokens": 22,
        "total_tokens": 40
    }
}

Streaming call example

from openai import OpenAI
import os

def get_response():
    client = OpenAI(
        # API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following is the base_url for the Singapore region.
        base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",
        
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}],
        stream=True,
        # The following setting displays token usage information in the last line of the streaming output.
        stream_options={"include_usage": True}
        )
    for chunk in completion:
        print(chunk.model_dump_json())


if __name__ == '__main__':
    get_response()

Output:

{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" a large","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" language model","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" from Alibaba","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" Cloud, and my","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" name is Qwen.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}

Function call example

The following code demonstrates multi-turn function calling with weather and time query tools.

from openai import OpenAI
from datetime import datetime
import json
import os

client = OpenAI(
    # API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the base_url for the Singapore region.
    base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",  
)

# Define the list of tools. The model refers to the name and description of the tools when selecting which one to use.
tools = [
    # Tool 1: Get the current time.
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Useful when you want to know the current time.",
            # Since no input parameters are needed to get the current time, the 'parameters' object is empty.
            "parameters": {}
        }
    },  
    # Tool 2: Get the weather for a specified city.
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Useful when you want to query the weather for a specified city.",
            "parameters": {  
                "type": "object",
                "properties": {
                    # A location is required to query the weather, so a 'location' parameter is defined.
                    "location": {
                        "type": "string",
                        "description": "A city or district, such as Beijing, Hangzhou, or Yuhang."
                    }
                },
                "required": [
                    "location"
                ]
            }
        }
    }
]

# Simulate a weather query tool. Example result: "It is rainy in Beijing today."
def get_current_weather(location):
    return f"It is rainy in {location} today. "

# A tool to query the current time. Example result: "Current time: 2024-04-15 17:15:18."
def get_current_time():
    # Get the current date and time.
    current_datetime = datetime.now()
    # Format the current date and time.
    formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
    # Return the formatted current time.
    return f"Current time: {formatted_time}."

# Define the model response function.
def get_response(messages):
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=messages,
        tools=tools
        )
    return completion.model_dump()

def call_with_messages():
    print('\n')
    messages = [
            {
                "content": input('Please enter: '),  # Example prompts: "What time is it now?" "What time will it be in an hour?" "What is the weather like in Beijing?"
                "role": "user"
            }
    ]
    print("-"*60)
    # First turn of the model call.
    i = 1
    first_response = get_response(messages)
    assistant_output = first_response['choices'][0]['message']
    print(f"\nLLM output in turn {i}: {first_response}\n")
    if  assistant_output['content'] is None:
        assistant_output['content'] = ""
    messages.append(assistant_output)
    # If the model determines that a tool call is not needed, it prints the assistant's reply directly.
    if assistant_output['tool_calls'] == None:  
        print(f"No tool call is needed. I can reply directly: {assistant_output['content']}")
        return
    # If a tool call is needed, the code continues to make model calls until a final answer is generated.
    while assistant_output['tool_calls'] != None:
        # If the model determines that the weather query tool needs to be called, run the weather query tool.
        if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
            tool_info = {"name": "get_current_weather", "role":"tool"}
            # Extract the location parameter.
            location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
            tool_info['content'] = get_current_weather(location)
        # If the model determines that the time query tool needs to be called, run the time query tool.
        elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
            tool_info = {"name": "get_current_time", "role":"tool"}
            tool_info['content'] = get_current_time()
        print(f"Tool output: {tool_info['content']}\n")
        print("-"*60)
        messages.append(tool_info)
        assistant_output = get_response(messages)['choices'][0]['message']
        if  assistant_output['content'] is None:
            assistant_output['content'] = ""
        messages.append(assistant_output)
        i += 1
        print(f"LLM output in turn {i}: {assistant_output}\n")
    print(f"Final answer: {assistant_output['content']}")

if __name__ == '__main__':
    call_with_messages()

If you enter What's the weather like in Hangzhou and Beijing? What time is it now?, the program returns the following output:

2024-06-26_10-04-56 (1).gif

Parameters

Input parameters compatible with the OpenAI API:

Parameter	Type	Default	Description
model	string	-	The model to use. See List of supported models.
messages	array	-	The conversation history between the user and the model. Each element in the array has the format `{"role": , "content": }`. The available roles are system, user, and assistant. The `system` role is supported only in `messages[0]`. In general, the `user` and `assistant` roles must alternate, and the role of the last element in `messages` must be `user`.
top_p (optional)	float	-	Nucleus sampling threshold. A value of 0.8 retains the smallest set of tokens whose cumulative probability exceeds 0.8. Range: (0, 1.0). Higher values increase randomness; lower values increase determinism.
temperature (optional)	float	-	Controls output randomness. A higher value produces more diverse output; a lower value produces more deterministic output. Range: [0, 2). Do not set to 0.
presence_penalty (optional)	float	-	Controls the repetition of tokens in the generated sequence. A higher `presence_penalty` value reduces token repetition. The value must be in the range of [-2.0, 2.0]. Note This parameter is supported only by commercial Qwen models and open source models from qwen1.5 and later.
n (optional)	integer	1	The number of responses to generate. The value must be in the range of `1-4`. For scenarios that require multiple responses, such as creative writing or ad copy, you can set a larger value for `n`. Setting a larger `n` value does not increase input token usage but increases output token usage. This parameter is currently supported only for the `qwen-plus` model, and its value is fixed to 1 when the `tools` parameter is passed.
max_tokens (optional)	integer	-	Maximum number of tokens the model can generate. Output limits vary by model. Check the supported models list above.
seed (optional)	integer	-	Random number seed for generation. `seed` supports unsigned 64-bit integers.
stream (optional)	boolean	False	Enables streaming output. When enabled, the API returns a generator that you iterate through. Each chunk is an incremental part of the response.
stop (optional)	string or array	None	The `stop` parameter provides precise control over the generation process by stopping generation when the model is about to output a specified string or token ID. `stop` can be a string or an array. string type Stops generation when the model is about to generate the specified stop word. For example, if `stop` is set to "hello", the model will stop when it is about to generate "hello". array type Elements can be token IDs, strings, or arrays of token IDs. Generation stops when the next token matches a stop list entry. Examples use the `qwen-turbo` tokenizer: 1. Elements are token IDs: The token IDs 108386 and 104307 correspond to the tokens "hello" and "weather" respectively. If `stop` is set to `[108386,104307]` , the model will stop when it is about to generate "hello" or "weather". 2. Elements are strings: If `stop` is set to `["hello","weather"]` , the model will stop when it is about to generate "hello" or "weather". 3. Elements are arrays of token IDs: The token IDs 108386 and 103924 correspond to the tokens "hello" and "ah", and the token IDs 35946 and 101243 correspond to "I" and "am fine". If `stop` is set to `[[108386, 103924],[35946, 101243]]`, the model stops when it is about to generate "hello ah" or "I am fine". Note When `stop` is an array, its elements must be of the same type. You cannot mix token IDs and strings, for example, `["hello", 104307]`.
tools (optional)	array	None	Tools the model can call. In a function call flow, the model selects one tool from this library. Each tool in the `tools` array has the following structure: `type`: Tool type. Currently only `function` is supported. `function`: An object with the following keys: `name`, `description`, and `parameters`: `name`: Function name. Only letters, numbers, underscores, and hyphens; max 64 characters. `description`: Function description. The model uses this to decide when and how to call the function. `parameters`: An object describing the tool's parameters as valid JSON Schema. If the `parameters` object is empty, the function takes no input parameters. In a function call flow, set the `tools` parameter both when initiating the call and when submitting tool results. Supported models: `qwen-turbo`, `qwen-plus`, and `qwen-max`. Note The `tools` parameter cannot be used with `stream=True`.
stream_options (optional)	object	None	Displays token usage during streaming. Effective only when `stream` is `True`. Set `stream_options={"include_usage":True}` to include token counts.

Response parameters

Parameter	Type	Description	Remarks
id	string	A unique, system-generated ID for the request.	-
model	string	The model used for the request.	-
system_fingerprint	string	Currently unused. Returns an empty string.	-
choices	array	A list of generated chat completions.	-
choices[i].finish_reason	string	The reason the model stopped generating tokens. Possible values are: `null`: The generation is still in progress. `stop`: The model generated a stop sequence specified in the request. `length`: The maximum number of tokens specified in the request was reached.
choices[i].message	object	A message object generated by the model.
choices[i].message.role	string	The role of the message author. This value is always `assistant`.
choices[i].message.content	string	The model-generated message content.
choices[i].index	integer	The index of the choice in the `choices` list. The default value is 0.
created	integer	The creation time of the chat completion, as a Unix timestamp in seconds.	-
usage	object	Token usage statistics for the request.	-
usage.prompt_tokens	integer	The number of tokens in the input prompt.	-
usage.completion_tokens	integer	The number of tokens in the generated completion.	-
usage.total_tokens	integer	The total number of tokens used in the request (`prompt_tokens` + `completion_tokens`).	-

Call with the langchain_openai SDK

Prerequisites

Install Python.

Install the langchain_openai SDK.

# If the following command fails, replace pip with pip3.
pip install -U langchain_openai

Activate Model Studio and get an API key. See Obtain an API key.
Configure the API key as an environment variable to reduce exposure risk. Configure the API key as an environment variable. You can also set it in code, but this increases exposure risk.
Select a model from the Supported model list.

Usage

Non-streaming output

Use the invoke method for non-streaming output:

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        # API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured an environment variable, replace this line with your Model Studio API key: api_key="sk-xxx"
        base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1", # This is the base_url for the Singapore region.
        model="qwen-plus"  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."},
        {"role":"user","content":"Who are you?"}
    ]
    response = llm.invoke(messages)
    print(response.json())

if __name__ == "__main__":
    get_response()

Output:

{
    "content": "I am a large language model from Alibaba Cloud. My name is Tongyi Qwen.",
    "additional_kwargs": {},
    "response_metadata": {
        "token_usage": {
            "completion_tokens": 16,
            "prompt_tokens": 22,
            "total_tokens": 38
        },
        "model_name": "qwen-plus",
        "system_fingerprint": "",
        "finish_reason": "stop",
        "logprobs": null
    },
    "type": "ai",
    "name": null,
    "id": "run-xxx",
    "example": false,
    "tool_calls": [],
    "invalid_tool_calls": []
}

Streaming output

Use the stream method for streaming output. No additional stream parameter is required.

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        # API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured an environment variable, replace this line with your Model Studio API key: api_key="sk-xxx"
        base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",   # This is the base_url for the Singapore region.
        model="qwen-plus",   # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        stream_usage=True
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."}, 
        {"role":"user","content":"Who are you?"},
    ]
    response = llm.stream(messages)
    for chunk in response:
        print(chunk.model_dump_json())

if __name__ == "__main__":
    get_response()

Output:

{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " a", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " large", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " from", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Alibaba Cloud. My name is Tongyi", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Qwen.", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}

For input parameters, see Input parameters.

HTTP API calls

Call Model Studio via HTTP. Responses follow the same structure as the OpenAI API.

Prerequisites

Activate Model Studio and get an API key. See Obtain an API key.
Configure the API key as an environment variable to reduce exposure risk. Configure an API key as an environment variable. You can also set it in code, but this increases exposure risk.

API request

Singapore: POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
Japan (Tokyo): POST https://{WorkspaceId}.ap-northeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
US (Virginia): POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
China (Beijing): POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
China (Hong Kong): POST https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/compatible-mode/v1/chat/completions

Request example

Call the API with cURL.

Note

If you have not configured the API key as an environment variable, replace $DASHSCOPE_API_KEY with your API key.

Non-streaming output

curl

# This is the base_url for the Singapore region.
curl --location 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ]
}'

Output:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "I am a large language model from Alibaba Cloud. My name is Qwen."
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 11,
        "completion_tokens": 16,
        "total_tokens": 27
    },
    "created": 1715252778,
    "system_fingerprint": "",
    "model": "qwen-plus",
    "id": "chatcmpl-xxx"
}

Streaming output

To enable streaming output, set the stream parameter to true in the request body.

# This is the base_url for the Singapore region.
curl --location 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

Output:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am "},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"a large "},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"language "},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"model from "},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"Alibaba Cloud. "},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"My name is Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

See Input parameter configuration.

Error response example

If a request fails, the response includes an error code and message explaining the failure.

{
    "error": {
        "message": "Incorrect API key provided. ",
        "type": "invalid_request_error",
        "param": null,
        "code": "invalid_api_key"
    }
}

Status codes

Error code	Description
400 - Invalid request error	Invalid request. See the error message for details.
401 - Incorrect API key provided	The provided API key is invalid.
429 - Rate limit reached for requests	The request rate has exceeded the limit, such as for queries per second (QPS) or queries per minute (QPM).
429 - You exceeded your current quota, please check your plan and billing details	Quota exceeded or your account has an overdue payment.
500 - The server had an error while processing your request	The server encountered an internal error.
503 - The engine is currently overloaded, please try again later	The service is temporarily overloaded. Please try again later.