How to call Qwen models by using the OpenAI interface - Alibaba Cloud Model Studio

The Qwen models from Alibaba Cloud Model Studio support OpenAI-compatible interfaces. You can migrate your existing OpenAI code to Model Studio by updating only the API key, BASE_URL, and model name.

Information required for OpenAI compatibility

BASE_URL

The BASE_URL is the network endpoint of the model service. This address lets you access the features or data that the service provides. When you use web services or APIs, the BASE_URL is typically the root URL for the API, to which specific endpoints are appended. You must set the BASE_URL when you use an OpenAI-compatible interface to call Model Studio.

When you make a call using the OpenAI SDK or other OpenAI-compatible SDKs, set the BASE_URL as follows:

Singapore: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
China (Beijing): https://dashscope.aliyuncs.com/compatible-mode/v1

When you make a call using an HTTP request, set the complete access endpoint as follows:

Singapore: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
China (Beijing): POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Model availability

The following table lists the Qwen models currently supported by the OpenAI-compatible interface.

Category

Model

Qwen

qwen-max

qwen-max-latest

qwen-max-2025-01-25

qwen-plus

qwen-plus-latest

qwen-plus-2025-04-28

qwen-plus-2025-01-25

qwen-turbo

qwen-turbo-latest

qwen-turbo-2025-04-28

qwen-turbo-2024-11-01

Qwen open-source series

qwq-32b

qwen3-235b-a22b

qwen3-32b

qwen3-30b-a3b

qwen3-14b

qwen3-8b

qwen3-4b

qwen3-1.7b

qwen3-0.6b

qwen2.5-14b-instruct-1m

qwen2.5-7b-instruct-1m

qwen2.5-72b-instruct

qwen2.5-32b-instruct

qwen2.5-14b-instruct

qwen2.5-7b-instruct

qwen2-72b-instruct

qwen2-7b-instruct

qwen1.5-110b-chat

qwen1.5-72b-chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat

Use the OpenAI SDK

Prerequisites

Make sure that a Python environment is installed on your computer.

Install the latest version of the OpenAI SDK.

# If the following command reports an error, replace pip with pip3.
pip install -U openai

Activate Model Studio and create an API key. For more information, see Preparations: Create and export an API key.

Export the API key as an environment variable to reduce the risk of API key leakage, see Export the API key as an environment variable. You can also set the API key in the code, but this increases the risk of leakage.
Select a model to use. For more information, see Model availability.

Usage

The following examples show how to use the OpenAI SDK to access Qwen models on Model Studio.

Non-streaming call example

from openai import OpenAI
import os

def get_response():
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured an environment variable, replace this line with api_key="sk-xxx" using your Model Studio API key.
        # Set the base_url of the DashScope SDK. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can replace the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models.
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}]
        )
    print(completion.model_dump_json())

if __name__ == '__main__':
    get_response()

Running the code produces the following result:

{
    "id": "chatcmpl-xxx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "I am a large-scale pre-trained model from Alibaba Cloud. My name is Qwen.",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1716430652,
    "model": "qwen-plus",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 18,
        "prompt_tokens": 22,
        "total_tokens": 40
    }
}

Streaming call example

from openai import OpenAI
import os


def get_response():
    client = OpenAI(
        # If you have not configured an environment variable, replace the following line with api_key="sk-xxx" using your Model Studio API key.
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # Set the base_url of the DashScope SDK. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
        
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can replace the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models.
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}],
        stream=True,
        # Use the following setting to display token usage information in the last line of the streaming output.
        stream_options={"include_usage": True}
        )
    for chunk in completion:
        print(chunk.model_dump_json())


if __name__ == '__main__':
    get_response()

Running the code produces the following result:

{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am"},"function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" a large-scale"},"function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" language"},"function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" model from Alibaba Cloud"},"function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":", and my name is Qwen."},"function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}

Function calling example

This section provides an example of how to implement the tool calling feature with an OpenAI-compatible interface using weather and time query tools. The sample code supports multi-round tool calling.

from openai import OpenAI
from datetime import datetime
import json
import os

client = OpenAI(
    # If you have not configured an environment variable, replace the following line with api_key="sk-xxx" using your Model Studio API key.
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # Set the base_url of the DashScope SDK. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1.
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  
)

# Define the tool list. The model refers to the name and description of the tools when selecting which tool to use.
tools = [
    # Tool 1: Get the current time.
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Useful for when you want to know the current time.",
            # Because no input parameters are required to get the current time, parameters is an empty dictionary.
            "parameters": {}
        }
    },  
    # Tool 2: Get the weather of a specified city.
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Useful for querying the weather in a specific city.",
            "parameters": {  
                "type": "object",
                "properties": {
                    # A location must be provided to query the weather, so the parameter is set to location.
                    "location": {
                        "type": "string",
                        "description": "A city or county, such as Beijing, Hangzhou, or Yuhang District."
                    }
                }
            },
            "required": [
                "location"
            ]
        }
    }
]

# Simulate the weather query tool. Sample result: "It is rainy in Beijing today."
def get_current_weather(location):
    return f"It is rainy in {location} today. "

# A tool to query the current time. Sample result: "Current time: 2024-04-15 17:15:18."
def get_current_time():
    # Get the current date and time.
    current_datetime = datetime.now()
    # Format the current date and time.
    formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
    # Return the formatted current time.
    return f"Current time: {formatted_time}."

# Encapsulate the model response function.
def get_response(messages):
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can replace the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models.
        messages=messages,
        tools=tools
        )
    return completion.model_dump()

def call_with_messages():
    print('\n')
    messages = [
            {
                "content": input('Please enter your query:'),  # Sample questions: "What time is it?" "What time will it be in an hour?" "What is the weather in Beijing?"
                "role": "user"
            }
    ]
    print("-"*60)
    # The first round of model calling.
    i = 1
    first_response = get_response(messages)
    assistant_output = first_response['choices'][0]['message']
    print(f"\nLarge model output in round {i}: {first_response}\n")
    if  assistant_output['content'] is None:
        assistant_output['content'] = ""
    messages.append(assistant_output)
    # If no tool needs to be called, return the final answer directly.
    if assistant_output['tool_calls'] == None:  # If the model determines that no tool needs to be called, print the assistant's reply directly. A second round of model calling is not required.
        print(f"No tool call is needed. I can answer directly: {assistant_output['content']}")
        return
    # If a tool needs to be called, perform multiple rounds of model calling until the model determines that no tool needs to be called.
    while assistant_output['tool_calls'] != None:
        # If the model determines that the weather query tool needs to be called, run the weather query tool.
        if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
            tool_info = {"name": "get_current_weather", "role":"tool"}
            # Extract the location parameter information.
            location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
            tool_info['content'] = get_current_weather(location)
        # If the model determines that the time query tool needs to be called, run the time query tool.
        elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
            tool_info = {"name": "get_current_time", "role":"tool"}
            tool_info['content'] = get_current_time()
        print(f"Tool output: {tool_info['content']}\n")
        print("-"*60)
        messages.append(tool_info)
        assistant_output = get_response(messages)['choices'][0]['message']
        if  assistant_output['content'] is None:
            assistant_output['content'] = ""
        messages.append(assistant_output)
        i += 1
        print(f"Large model output in round {i}: {assistant_output}\n")
    print(f"Final answer: {assistant_output['content']}")

if __name__ == '__main__':
    call_with_messages()

If you enter What is the weather in Singapore?, the program generates the following output:

2024-06-26_10-04-56 (1).gif

Request parameters

The input parameters are aligned with the OpenAI interface parameters. The following parameters are supported:

Parameter	Type	Default	Description
model	string	-	Specifies the model to use. For a list of available models, see Model availability.
messages	array	-	The conversation history between the user and the model. Each element in the array is in the format `{"role": "Role", "content": "Content"}`. The role can be system, user, or assistant. The role can be set to system only in `messages[0]`. Typically, user and assistant must appear alternately, and the role of the last element in messages must be user.
top_p (optional)	float	-	The probability threshold for nucleus sampling. For example, if you set this parameter to 0.8, the model generates tokens from the smallest set of top tokens that have a cumulative probability of 0.8 or higher. Valid values: (0, 1.0). A larger value indicates higher randomness. A smaller value indicates higher determinism.
temperature (optional)	float	-	Controls the randomness and diversity of the model's responses. A higher temperature value smooths the probability distribution of candidate words, which allows less probable words to be selected and generates more diverse results. A lower temperature value sharpens the probability distribution, which makes more probable words easier to be selected and generates more deterministic results. Valid values: [0, 2). We recommend that you do not set the value to 0.
presence_penalty (optional)	float	-	Controls the repetition of the entire sequence in the generated content. A higher value for presence_penalty reduces repetition. Valid values: [-2.0, 2.0]. Note Currently, this parameter is supported only on Qwen commercial models and open-source models of qwen1.5 and later.
n (optional)	integer	1	The number of responses to generate. Valid values: 1 to `4`. For scenarios that require multiple responses, such as creative writing or ad copy, you can set a larger value for n. A larger value for n does not increase input token consumption but increases output token consumption. Currently, this parameter is supported only for the qwen-plus model. The value must be 1 when the tools parameter is passed.
max_tokens (optional)	integer	-	The maximum number of tokens that the model can generate. For example, if the maximum output length of the model is 2,000 tokens, you can set this parameter to 1,000 to prevent the model from generating excessively long content. Different models have different output limits. For more information, see the model list.
seed (optional)	integer	-	The random number seed used during generation to control the randomness of the generated content. The seed parameter supports unsigned 64-bit integers.
stream (optional)	boolean	False	Specifies whether to use streaming output. When results are streamed, the API returns a generator. You must iterate to get the results. Each output is the incremental sequence that is currently generated.
stop (optional)	string or array	None	The stop parameter provides precise control over the content generation procedure. The model automatically stops generating content when it is about to include a specified string or token ID. The stop parameter can be a string or an array. string The model stops when it is about to generate the specified stop word. For example, if you set stop to "Hello", the model stops when it is about to generate "Hello". array The elements in the array can be token IDs, strings, or an array of token IDs. The model stops generating content if the next token to be generated, or its corresponding token ID, is in the stop array. The following examples show how to use an array for the stop parameter. In these examples, the tokenizer corresponds to the qwen-turbo model: 1. The elements are token IDs: The token IDs 108386 and 104307 correspond to the tokens "Hello" and "Weather" respectively. If you set stop to `[108386,104307]`, the model stops when it is about to generate "Hello" or "Weather". 2. The elements are strings: If you set stop to `["Hello","Weather"]`, the model stops when it is about to generate "Hello" or "Weather". 3. The elements are arrays: The token IDs 108386 and 103924 correspond to the tokens "Hello" and "there" respectively. The token IDs 35946 and 101243 correspond to the tokens "I" and "fine" respectively. If you set stop to `[[108386, 103924],[35946, 101243]]`, the model stops when it is about to generate "Hello there" or "I am fine". Note When stop is an array, do not mix token IDs and strings as elements. For example, do not set stop to `["Hello",104307]`.
tools (optional)	array	None	Specifies the tool library that the model can call. In a tool calling process, the model selects one tool from the library. The structure of each tool in the tools array is as follows: type: a string that indicates the type of tools. Currently, only function is supported. function: an object that includes the name, description, and parameters key-value pairs. name: a string that indicates the name of the tool function. It must consist of letters and digits, and can contain underscores (_) and hyphens (-). The maximum length is 64 characters. description: a string that describes the tool function. The model uses this description to decide when and how to call the function. parameters: an object that describes the parameters of the tool. It must be a valid JSON Schema. For more information about the JSON Schema, see the JSON Schema documentation. If the parameters object is empty, the function has no input parameters. In the tool calling process, you must set the tools parameter both when you initiate a round of tool calls and when you submit the execution results of the tool function to the model. The currently supported models include qwen-turbo, qwen-plus, and qwen-max. Note The tools parameter cannot be used with stream=True at the same time.
stream_options (optional)	object	None	Configures whether to display the number of tokens used during streaming output. This parameter takes effect only when stream is set to True. To count the number of tokens in streaming output mode, set this parameter to `stream_options={"include_usage":True}`.

Response parameters

Parameter	Data type	Description	Note
id	string	The system-generated ID for this call.	None
model	string	The name of the model that is called.	None
system_fingerprint	string	The configuration version used by the model runtime. This is not currently supported and returns an empty string "".	None
choices	array	Details of the content generated by the model.	None
choices[i].finish_reason	string	The following three cases apply: The value is null during generation. stop: the stop condition in the input parameters is triggered. length: the output exceeds the maximum length.
choices[i].message	object	The message that the model outputs.
choices[i].message.role	string	The role of the model, which is fixed to assistant.
choices[i].message.content	string	The text generated by the model.
choices[i].index	integer	The sequence number of the generated result. Default value: 0.
created	integer	The UNIX timestamp (in seconds) when the result was generated.	None
usage	object	The token consumption of the request.	None
usage.prompt_tokens	integer	The number of tokens in the input text.	None
usage.completion_tokens	integer	The number of tokens in the generated response.	None
usage.total_tokens	integer	The sum of usage.prompt_tokens and usage.completion_tokens.	None

Use the langchain_openai SDK

Prerequisites

Make sure that a Python environment is installed on your computer.

Run the following command to install the langchain_openai SDK.

# If the following command reports an error, replace pip with pip3.
pip install -U langchain_openai

Activate Model Studio and create an API key. For more information, see Preparations: Create and export an API key.
Export the API key as an environment variable to reduce the risk of API key leakage, see Export the API key as an environment variable. You can also set the API key in the code, but this increases the risk of leakage.
Select a model to use. For more information, see Model availability.

Usage

The following examples show how to use the langchain_openai SDK to call the Qwen models from Model Studio.

Non-streaming output

Non-streaming output is implemented using the invoke method. The following is the sample code:

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured an environment variable, replace this line with api_key="sk-xxx" using your Model Studio API key.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  # If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1.
        model="qwen-plus"  # This example uses qwen-plus. You can replace the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models.
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."},
        {"role":"user","content":"Who are you?"}
    ]
    response = llm.invoke(messages)
    print(response.json())

if __name__ == "__main__":
    get_response()

Running the code produces the following result:

{
    "content": "I am a large language model from Alibaba Cloud. My name is Qwen.",
    "additional_kwargs": {},
    "response_metadata": {
        "token_usage": {
            "completion_tokens": 16,
            "prompt_tokens": 22,
            "total_tokens": 38
        },
        "model_name": "qwen-plus",
        "system_fingerprint": "",
        "finish_reason": "stop",
        "logprobs": null
    },
    "type": "ai",
    "name": null,
    "id": "run-xxx",
    "example": false,
    "tool_calls": [],
    "invalid_tool_calls": []
}

Streaming output

Streaming output is implemented using the stream method. You do not need to set the stream parameter.

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured an environment variable, replace this line with api_key="sk-xxx" using your Model Studio API key.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",   # If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1.
        model="qwen-plus",   # This example uses qwen-plus. You can replace the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models.
        stream_usage=True
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."}, 
        {"role":"user","content":"Who are you?"},
    ]
    response = llm.stream(messages)
    for chunk in response:
        print(chunk.model_dump_json())

if __name__ == "__main__":
    get_response()

Running the code produces the following result:

{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " a large", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " from", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Alibaba", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Cloud", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ", and my name is Qwen.", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}

For information about input parameter settings, see Request parameters. The relevant parameters are defined in the ChatOpenAI object.

Use the HTTP interface

You can call Model Studio using the HTTP interface to receive a response that has the same structure as a response from the OpenAI service.

Prerequisites

Activate Model Studio and create an API key. For more information, see Preparations: Create and export an API key.
Export the API key as an environment variable to reduce the risk of API key leakage, see Export the API key as an environment variable. You can also set the API key in the code, but this increases the risk of leakage.

Submit an API request

Singapore: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
China (Beijing): POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Sample request

The following example shows a script that calls the API using the cURL command.

Note

If you have not configured the API key as an environment variable, you must replace $DASHSCOPE_API_KEY with your API key.

Non-streaming output

curl

# If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ]
}'

Running the command produces the following result:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "I am a large language model from Alibaba Cloud. My name is Qwen."
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 11,
        "completion_tokens": 16,
        "total_tokens": 27
    },
    "created": 1715252778,
    "system_fingerprint": "",
    "model": "qwen-plus",
    "id": "chatcmpl-xxx"
}

Streaming output

If you want to use streaming output, set the stream parameter to true in the request body.

# If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

Running the command produces the following result:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" a large-scale"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" from Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":", and my name is Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

For details about the input parameters, see Request parameters.

Abnormal response example

If an error occurs during the request, the response indicates the cause of the error in the code and message fields.

{
    "error": {
        "message": "Incorrect API key provided. ",
        "type": "invalid_request_error",
        "param": null,
        "code": "invalid_api_key"
    }
}

Status codes

Error code	Description
400 - Invalid Request Error	The request is invalid. For more information, see the error message.
401 - Incorrect API key provided	The API key is incorrect.
429 - Rate limit reached for requests	The queries per second (QPS), queries per minute (QPM), or other limits are exceeded.
429 - You exceeded your current quota, please check your plan and billing details	Your quota is exceeded or your account has an overdue payment.
500 - The server had an error while processing your request	A server-side error occurred.
503 - The engine is currently overloaded, please try again later	The server is overloaded. Try again later.