Qwen models provide powerful capabilities to process natural languages. You can call a Qwen model by using SDK or calling API operations over HTTP to integrate Qwen models into your business.
Model overview
The following table describes the Qwen models that you can use by calling API operations.
Name | Description | Input and output limits |
qwen-turbo | An ultra-large language model that supports multiple input languages such as Chinese and English. | This model supports a context of up to 8,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 6,000. |
qwen-plus | An enhanced ultra-large language model that supports multiple input languages such as Chinese and English. | This model supports a context of up to 32,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 30,000. |
qwen-max | A 100-billion-level ultra-large language model and multiple input languages such as Chinese and English. The qwen-max model is updated in rolling mode. If you want to use a stable version, use a historical snapshot version. The latest qwen-max model is equivalent to the qwen-max-0428 snapshot and is the API model for Qwen2.5. | This model supports a context of up to 8,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 6,000. |
The limits on query frequency and number of tokens vary based on the model. Before you call a model, we recommend that you check the throttling thresholds of the model. For more information, see the Throttling thresholds section of the Billing topic.
Use SDK
You can use the SDK to use multiple features, such as single-round conversation, multi-round conversation, streaming output, and function call.
Prerequisites
If you use Python, the SDK for Python Version 1.17.0 or later is installed.
If you use Java, the SDK for Java Version 2.12.0 or later is installed.
For more information, see Install Alibaba Cloud Model Studio SDK.
Alibaba Cloud Model Studio is activated and an API key is obtained. For more information, see Activate Alibaba Cloud Model Studio and Obtain an API key.
We recommend that you set the API key as an environment variable to reduce the risk of an API key leak. For more information, see Set API key as an environment variable. You can also configure the API key in your code, but this increases the risk of leaks.
Single-round conversation
You can use Qwen in various scenarios such as content creation, translation, and text summary. You can run the following sample code to use the single-round conversation capability of Qwen models:
import random
from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def call_with_messages():
messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you'}]
response = Generation.call(model="qwen-turbo",
messages=messages,
# Specify the random seed. If you leave this parameter empty, the random seed is set to 1234 by default.
seed=random.randint(1, 10000),
# Set the output format to message.
result_format='message')
if response.status_code == HTTPStatus.OK:
print(response)
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
if __name__ == '__main__':
call_with_messages()
// Copyright (c) Alibaba, Inc. and its affiliates.
// We recommend that you use DashScope SDK for Java V2.12.0 or later.
import java.util.Arrays;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
Message systemMsg = Message.builder()
.role(Role.SYSTEM.getValue())
.content("You are a helpful assistant.")
.build();
Message userMsg = Message.builder()
.role(Role.USER.getValue())
.content("Who are you")
.build();
GenerationParam param = GenerationParam.builder()
.model("qwen-turbo")
.messages(Arrays.asList(systemMsg, userMsg))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.topP(0.8)
.build();
return gen.call(param);
}
public static void main(String[] args) {
try {
GenerationResult result = callWithMessage();
System.out.println(result);
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
// Record the error information by using a logging framework.
// Logger.error("An error occurred while calling the generation service", e);
System.err.println("An error occurred while calling the generation service: " + e.getMessage());
}
System.exit(0);
}
}
Sample response:
{
"status_code": 200,
"request_id": "dbb7fab4-6a82-92f5-896c-22ec1532c0a5",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "I am Qwen, a large language model created by Alibaba Cloud. I'm here to assist you with your questions and provide information on various topics. How can I help you today?"
}
}
]
},
"usage": {
"input_tokens": 22,
"output_tokens": 37,
"total_tokens": 59
}
}
Multi-round conversation
Compared with the single-round conversation capability, the multi-round conversation capability allows the model to refer to conversation history, which is more similar to daily communication. However, the number of tokens that are consumed increases because the model refers to conversation history. You can run the following sample code to use the multi-round conversation capability of Qwen models:
from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def multi_round():
messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you'}]
response = Generation.call(model="qwen-turbo",
messages=messages,
# Set the output format to message.
result_format='message')
if response.status_code == HTTPStatus.OK:
print(response)
# Add the message returned by the model to the message list.
messages.append({'role': response.output.choices[0]['message']['role'],
'content': response.output.choices[0]['message']['content']})
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
# If the response fails, delete the last user message from the message list. This ensures that the user messages and the messages returned by the model alternately appear.
messages = messages[:-1]
# Add the second user question to the message list.
messages.append({'role': 'user', 'content': 'Nice to meet you'})
# Respond to the second user question.
response = dashscope.Generation.call(model="qwen-turbo",
messages=messages,
result_format='message', # Set the output format to message.
)
if response.status_code == HTTPStatus.OK:
print(response)
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
if __name__ == '__main__':
multi_round()
// Copyright (c) Alibaba, Inc. and its affiliates.
import java.util.ArrayList;
import java.util.List;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static GenerationParam createGenerationParam(List<Message> messages) {
return GenerationParam.builder()
.model("qwen-turbo")
.messages(messages)
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.topP(0.8)
.build();
}
public static GenerationResult callGenerationWithMessages(GenerationParam param) throws ApiException, NoApiKeyException, InputRequiredException {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
return gen.call(param);
}
public static void main(String[] args) {
try {
List<Message> messages = new ArrayList<>();
messages.add(createMessage(Role.SYSTEM, "You are a helpful assistant."));
messages.add(createMessage(Role.USER, "Who are you"));
GenerationParam param = createGenerationParam(messages);
GenerationResult result = callGenerationWithMessages(param);
printResult(result);
// Add the message returned by the model to the message list.
messages.add(result.getOutput().getChoices().get(0).getMessage());
// Add the second user question.
messages.add(createMessage(Role.USER, "Nice to meet you"));
result = callGenerationWithMessages(param);
printResult(result);
printResultAsJson(result);
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
e.printStackTrace();
}
System.exit(0);
}
private static Message createMessage(Role role, String content) {
return Message.builder().role(role.getValue()).content(content).build();
}
private static void printResult(GenerationResult result) {
System.out.println(result);
}
private static void printResultAsJson(GenerationResult result) {
System.out.println(JsonUtils.toJson(result));
}
}
Sample response:
{
"status_code": 200,
"request_id": "8cf046e4-4b3b-92be-ab03-d2d3152198ee",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?"
}
}
]
},
"usage": {
"input_tokens": 22,
"output_tokens": 56,
"total_tokens": 78
}
}
{
"status_code": 200,
"request_id": "75824057-7214-9b15-a701-004d88337def",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Nice to meet you too! If you have any questions or need assistance, feel free to ask, and I'll do my best to help you."
}
}
]
},
"usage": {
"input_tokens": 92,
"output_tokens": 30,
"total_tokens": 122
}
}
You can also run the following sample code to use the real-time interaction feature:
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def get_response(messages):
response = Generation.call(model="qwen-turbo",
messages=messages,
# Set the output format to message.
result_format='message')
return response
messages = [{'role': 'system', 'content': 'You are a helpful assistant.'}]
# Customize the number of conversation rounds. In this example, the number of conversation rounds is set to 3.
for i in range(3):
user_input = input("Input:")
messages.append({'role': 'user', 'content': user_input})
assistant_output = get_response(messages).output.choices[0]['message']['content']
messages.append({'role': 'assistant', 'content': assistant_output})
print(f'Input: {user_input}')
print(f'Output: {assistant_output}')
print('\n')
import java.util.ArrayList;
import java.util.List;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import java.util.Scanner;
public class Main {
public static GenerationParam createGenerationParam(List<Message> messages) {
return GenerationParam.builder()
.model("qwen-turbo")
.messages(messages)
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.topP(0.8)
.build();
}
public static GenerationResult callGenerationWithMessages(GenerationParam param) throws ApiException, NoApiKeyException, InputRequiredException {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
return gen.call(param);
}
public static void main(String[] args) {
try {
List<Message> messages = new ArrayList<>();
messages.add(createMessage(Role.SYSTEM, "You are a helpful assistant."));
for (int i = 0; i < 3;i++) {
Scanner scanner = new Scanner(System.in);
System.out.print("Input:");
String userInput = scanner.nextLine();
if ("exit".equalsIgnoreCase(userInput)) {
break;
}
messages.add(createMessage(Role.USER, userInput));
GenerationParam param = createGenerationParam(messages);
GenerationResult result = callGenerationWithMessages(param);
System.out.println("Output: "+result.getOutput().getChoices().get(0).getMessage().getContent());
messages.add(result.getOutput().getChoices().get(0).getMessage());
}
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
e.printStackTrace();
}
System.exit(0);
}
private static Message createMessage(Role role, String content) {
return Message.builder().role(role.getValue()).content(content).build();
}
}
Streaming output
A large language model (LLM) does not directly generate the final answer. The model gradually generates and returns intermediate answers. In non-streaming output mode, a model generates and concatenates intermediate answers to generate and return the final answer. In streaming output mode, a model generates and returns intermediate answers in real time. You can immediately read the intermediate answers. This reduces the amount of time that is required to wait for a response from the model. To enable the streaming output mode, you must configure some settings. If you use the SDK for Python, set the stream parameter to True. If you use SDK for Java, call the streamCall operation.
from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def call_with_stream():
messages = [
{'role': 'user', 'content': 'Who are you'}]
responses = Generation.call(model="qwen-turbo",
messages=messages,
result_format='message', # Set the output format to message.
stream=True, # Enable the streaming output mode.
incremental_output=True # Enable the incremental streaming output mode.
)
for response in responses:
if response.status_code == HTTPStatus.OK:
print(response.output.choices[0]['message']['content'], end='')
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
if __name__ == '__main__':
call_with_stream()
// Copyright (c) Alibaba, Inc. and its affiliates.
import java.util.Arrays;
import java.util.concurrent.Semaphore;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.concurrent.Semaphore;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import io.reactivex.Flowable;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static void handleGenerationResult(GenerationResult message, StringBuilder fullContent) {
fullContent.append(message.getOutput().getChoices().get(0).getMessage().getContent());
logger.info("Received message: {}", JsonUtils.toJson(message));
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
StringBuilder fullContent = new StringBuilder();
result.blockingForEach(message -> handleGenerationResult(message, fullContent));
logger.info("Full content: \n{}", fullContent.toString());
}
public static void streamCallWithCallback(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException, InterruptedException {
GenerationParam param = buildGenerationParam(userMsg);
Semaphore semaphore = new Semaphore(0);
StringBuilder fullContent = new StringBuilder();
gen.streamCall(param, new ResultCallback<GenerationResult>() {
@Override
public void onEvent(GenerationResult message) {
handleGenerationResult(message, fullContent);
}
@Override
public void onError(Exception err) {
logger.error("Exception occurred: {}", err.getMessage());
semaphore.release();
}
@Override
public void onComplete() {
logger.info("Completed");
semaphore.release();
}
});
semaphore.acquire();
logger.info("Full content: \n{}", fullContent.toString());
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
.model("qwen-turbo")
.messages(Arrays.asList(userMsg))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.topP(0.8)
.incrementalOutput(true)
.build();
}
public static void main(String[] args) {
try {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
streamCallWithMessage(gen, userMsg);
streamCallWithCallback(gen, userMsg);
} catch (ApiException | NoApiKeyException | InputRequiredException | InterruptedException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
}
}
Function calling
LLMs may not provide expected answers to questions related to time-sensitive topics, private-domain knowledge, or mathematical calculation. You can use the function calling feature to improve the generated output. When you call a model, you can use the tools parameter to specify the name, description, and request parameters of a tool. After the model receives the prompt and tool information, the model determines whether to use a tool and perform the following operations:
If the model does not need to use the tool, it does not return the tool_calls parameter and directly return the generated response.
If the model needs to use the tool, it returns a message that contains the tool_calls parameter. The application calls the tool based on the message. In this case, your application needs to parse the function name and request parameters of the tool from the tool_calls parameter and pass the request parameters to the tool to obtain results from the tool. The application needs to configure the tool information in the following format:
{ "name": "$Tool name", "role": "tool", "content": "$Output generated by the tool" }
Add the tool information to conversation history, enter a question to ask the model, and then obtain the final answer.
The following figure shows the flowchart of a function call.
The information generated by function calls cannot be returned in incremental streaming output mode. For more information about the incremental streaming output mode, see the incremental_output parameter in the Request parameters section of this topic.
The application needs to parse the parameters of a tool during the function call process. Therefore, you must use a model that provides high-quality responses. We recommend that you use the qwen-max model. Sample code:
from dashscope import Generation
import dashscope
from datetime import datetime
import random
import json
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Define a tool list. The model selects a tool based on the name and description of the tool.
tools = [
# Use Tool 1 to query the current time.
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "This tool can help you query the current time.",
"parameters": {} # You can query the current time without the need to specify request parameters. Therefore, the parameters parameter is left empty.
}
},
# Use Tool 2 to query the weather of a city.
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "This tool can help you query the weather of a city.",
"parameters": { # The location parameter specifies the location whose weather you want to query. Therefore, the location parameter is specified in the parameters parameter.
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city, county, or district, such as Beijing, Hangzhou, or Yuhang."
}
}
},
"required": [
"location"
]
}
}
]
# Simulate the weather query tool. Sample response: "It is sunny today in Beijing."
def get_current_weather(location):
return f"It is sunny today in {location}. "
# Simulate the tool that is used to query the current time. Sample response: "Current time: 2024-04-15 17:15:18. "
def get_current_time():
# Query the current date and time.
current_datetime = datetime.now()
# Format the current date and time.
formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
# Return the formatted current date and time.
return f"Current time: {formatted_time}."
# Encapsulate the response function of the model.
def get_response(messages):
response = Generation.call(
model='qwen-max',
messages=messages,
tools=tools,
seed=random.randint(1, 10000), # Specify the random seed. If you leave this parameter empty, the random seed is set to 1234 by default.
result_format='message' # Set the output format to message.
)
return response
def call_with_messages():
print('\n')
messages = [
{
"content": input('Input:'), # Sample questions: "What time is it now?" "What is the time in an hour?" "What is the weather like in Beijing?"
"role": "user"
}
]
# Call the model in the first round.
first_response = get_response(messages)
assistant_output = first_response.output.choices[0].message
print(f"\nResponse returned by the model in the first round: {first_response}\n")
messages.append(assistant_output)
if 'tool_calls' not in assistant_output: # If the model determines that no tool is required, display the answer generated by the model without the need to call the model in the second round.
print(f"Final answer: {assistant_output.content}") # Return the final answer generated by the model. You can specify the content of the final answer to be returned when no tool is called based on your business requirements.
return
# The following sample code provides an example if the get_current_weather tool is called:
elif assistant_output.tool_calls[0]['function']['name'] == 'get_current_weather':
tool_info = {"name": "get_current_weather", "role":"tool"}
location = json.loads(assistant_output.tool_calls[0]['function']['arguments'])['properties']['location']
tool_info['content'] = get_current_weather(location)
# The following sample code provides an example if the get_current_time tool is called:
elif assistant_output.tool_calls[0]['function']['name'] == 'get_current_time':
tool_info = {"name": "get_current_time", "role":"tool"}
tool_info['content'] = get_current_time()
print(f"Output generated by the tool: {tool_info['content']}\n")
messages.append(tool_info)
# Call the model in the second round to summarize the output generated by the tool.
second_response = get_response(messages)
print(f"Response returned by the model in the second round: {second_response}\n")
print(f"Final answer: {second_response.output.choices[0].message['content']}")
if __name__ == '__main__':
call_with_messages()
// Copyright (c) Alibaba, Inc. and its affiliates.
// We recommend that you use DashScope SDK for Java V2.12.0 or later.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import com.alibaba.dashscope.aigc.conversation.ConversationParam.ResultFormat;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationOutput.Choice;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.tools.FunctionDefinition;
import com.alibaba.dashscope.tools.ToolCallBase;
import com.alibaba.dashscope.tools.ToolCallFunction;
import com.alibaba.dashscope.tools.ToolFunction;
import com.alibaba.dashscope.utils.JsonUtils;
import com.fasterxml.jackson.databind.node.ObjectNode;
import com.github.victools.jsonschema.generator.Option;
import com.github.victools.jsonschema.generator.OptionPreset;
import com.github.victools.jsonschema.generator.SchemaGenerator;
import com.github.victools.jsonschema.generator.SchemaGeneratorConfig;
import com.github.victools.jsonschema.generator.SchemaGeneratorConfigBuilder;
import com.github.victools.jsonschema.generator.SchemaVersion;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Scanner;
public class Main {
public class GetWhetherTool {
private String location;
public GetWhetherTool(String location) {
this.location = location;
}
public String call() {
return location+"It is sunny today.";
}
}
public class GetTimeTool {
public GetTimeTool() {
}
public String call() {
LocalDateTime now = LocalDateTime.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
String currentTime = "Current time:" + now.format(formatter) + ".";
return currentTime;
}
}
public static void SelectTool()
throws NoApiKeyException, ApiException, InputRequiredException {
SchemaGeneratorConfigBuilder configBuilder =
new SchemaGeneratorConfigBuilder(SchemaVersion.DRAFT_2020_12, OptionPreset.PLAIN_JSON);
SchemaGeneratorConfig config = configBuilder.with(Option.EXTRA_OPEN_API_FORMAT_VALUES)
.without(Option.FLATTENED_ENUMS_FROM_TOSTRING).build();
SchemaGenerator generator = new SchemaGenerator(config);
ObjectNode jsonSchema_whether = generator.generateSchema(GetWhetherTool.class);
ObjectNode jsonSchema_time = generator.generateSchema(GetTimeTool.class);
FunctionDefinition fd_whether = FunctionDefinition.builder().name("get_current_whether").description("Queries the weather of a location.")
.parameters(JsonUtils.parseString(jsonSchema_whether.toString()).getAsJsonObject()).build();
FunctionDefinition fd_time = FunctionDefinition.builder().name("get_current_time").description("Queries the current time.")
.parameters(JsonUtils.parseString(jsonSchema_time.toString()).getAsJsonObject()).build();
Message systemMsg = Message.builder().role(Role.SYSTEM.getValue())
.content("You are a helpful assistant. When asked a question, use tools wherever possible.")
.build();
Scanner scanner = new Scanner(System.in);
System.out.print("\nInput:");
String userInput = scanner.nextLine();
Message userMsg =
Message.builder().role(Role.USER.getValue()).content(userInput).build();
List<Message> messages = new ArrayList<>();
messages.addAll(Arrays.asList(systemMsg, userMsg));
GenerationParam param = GenerationParam.builder().model(Generation.Models.QWEN_MAX)
.messages(messages).resultFormat(ResultFormat.MESSAGE)
.tools(Arrays.asList(ToolFunction.builder().function(fd_whether).build(),ToolFunction.builder().function(fd_time).build())).build();
// Call the model in the first round.
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
GenerationResult result = gen.call(param);
System.out.println("\nResponse returned by the model in the first round:"+JsonUtils.toJson(result));
for (Choice choice : result.getOutput().getChoices()) {
messages.add(choice.getMessage());
// Call a tool.
if (result.getOutput().getChoices().get(0).getMessage().getToolCalls() != null) {
for (ToolCallBase toolCall : result.getOutput().getChoices().get(0).getMessage()
.getToolCalls()) {
if (toolCall.getType().equals("function")) {
// Parse the function name and request parameters of the tool.
String functionName = ((ToolCallFunction) toolCall).getFunction().getName();
String functionArgument = ((ToolCallFunction) toolCall).getFunction().getArguments();
// The model determines whether to call the get_current_whether tool.
if (functionName.equals("get_current_whether")) {
GetWhetherTool GetWhetherFunction =
JsonUtils.fromJson(functionArgument, GetWhetherTool.class);
String whether = GetWhetherFunction.call();
Message toolResultMessage = Message.builder().role("tool")
.content(String.valueOf(whether)).toolCallId(toolCall.getId()).build();
messages.add(toolResultMessage);
System.out.println("\nOutput generated by the tool:"+whether);
}
// The model determines whether to call the get_current_time tool.
else if (functionName.equals("get_current_time")) {
GetTimeTool GetTimeFunction =
JsonUtils.fromJson(functionArgument, GetTimeTool.class);
String time = GetTimeFunction.call();
Message toolResultMessage = Message.builder().role("tool")
.content(String.valueOf(time)).toolCallId(toolCall.getId()).build();
messages.add(toolResultMessage);
System.out.println("\nOutput generated by the tool:"+time);
}
}
}
}
// Return the final answer generated by the model if no tool is required.
else {
// Return the final answer generated by the model. You can specify the content of the final answer to be returned when no tool is called based on your business requirements.
System.out.println("\nFinal answer:"+result.getOutput().getChoices().get(0).getMessage().getContent());
return;
}
}
// Call the model in the second round to generate the answer that contains the output generated by the tool.
param.setMessages(messages);
result = gen.call(param);
System.out.println("\nResponse returned by the model in the second round:"+JsonUtils.toJson(result));
System.out.println(("\nFinal answer:"+result.getOutput().getChoices().get(0).getMessage().getContent()));
}
public static void main(String[] args) {
try {
SelectTool();
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.out.println(String.format("Exception %s", e.getMessage()));
}
System.exit(0);
}
}
The following sample code provides examples on the response returned by the model in the first round when the function call process is initiated. If you enter "Weather in Hangzhou", the model returns the tool_calls parameter. If you enter "Hello", the model determines that no tool is required and does not return the tool_calls parameter.
Enter "Weather in Hangzhou"
{
"status_code": 200,
"request_id": "bd803417-56a7-9597-9d3f-a998a35b0477",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_current_weather",
"arguments": "{\"properties\": {\"location\": \"Hangzhou\"}, \"type\": \"object\"}"
},
"id": "",
"type": "function"
}
]
}
}
]
},
"usage": {
"input_tokens": 222,
"output_tokens": 27,
"total_tokens": 249
}
}
Enter "Hello"
{
"status_code": 200,
"request_id": "28e9d70c-c4d7-9bfb-bd07-8cf4228dda91",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Hello! What can I do for you? You can ask questions about the weather, time, or other things."
}
}
]
},
"usage": {
"input_tokens": 221,
"output_tokens": 21,
"total_tokens": 242
}
}
You can refer to the definitions of tools in the sample code to add more tools based on your business requirements.
Request parameters
The responses generated by a model are determined by request parameters, such as prompt, model, stream, and temperature. The following table describes the request parameters that you can specify when you call a model.
The Type column contains the following data types:
string
array: the List type in Python or the ArrayList type in Java.
integer
float
boolean
object: the hash table.
Parameter | Type | Description |
model | string | Required. The name of the Qwen model to be called for conversations. Valid values: |
messages | array |
Note You need to specify one of the messages and prompt parameters. The history parameter that can be used together with the prompt parameter will be discontinued. If you use only the prompt parameter, the Qwen model may have limits on recording conversation history. The messages parameter allows the Qwen model to refer to conversation history. This way, the Qwen model can parse the intention of the user more accurately and ensure the context and continuity of conversations. Therefore, we recommend that you use the messages parameter in multi-round conversation scenarios. |
prompt | string | |
history | array | This parameter will be discontinued. We recommend that you use the messages parameter. The conversation history between the user and the Qwen model. Each element in the array specifies a round of conversation. Specify each element in the following format: {"user":"User question","bot":"Answer generated by the model"}. Specify multiple rounds of conversations in chronological order. Default value: []. |
seed | integer | Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model. Valid values: 64-bit unsigned integers. Default value: 1234. |
max_tokens | integer | Optional. The maximum number of tokens that can be generated by the model.
|
top_p | float | Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content. Valid values: (0,1.0). Default value: 0.8. |
top_k | integer | Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content. By default, the top_k parameter is left empty. If the top_k parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect. |
repetition_penalty | float | Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty. No valid values are specified for this parameter. Default value: 1.1. |
temperature | float | Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable. Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless. Default value: 0.85. |
stop | string or array | Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. The value of the stop parameter can be a string or an array.
|
stream | boolean | Optional. Specifies whether to enable streaming output mode. In streaming output mode, the model returns a generator. You need to use an iterative loop to fetch the results from the generator and incrementally display the text. You can change the output mode to non-incremental by setting the incremental_output parameter to False. Default value: False. |
enable_search | boolean | Optional. Specifies whether to enable the Internet search feature for reference during content generation. Valid values:
|
result_format | string | Optional. The output format of the response. Valid values: text and message. For more information about the message format, see the Sample responses section of this topic. We recommend that you set the output format to message. Default value: text. |
incremental_output | boolean | Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content. For more information, see the sample code in the Streaming output section of this topic. Examples:
This parameter takes effect only if the stream parameter is set to True. Default value: False. Note The incremental_output parameter cannot be used together with the tools parameter. |
tools | array | A list of tools that can be called by the model. The model calls a tool from the tool list during each function call process. A tool in the tool list contains the following parameters:
To use the tools parameter, you must set the result_format parameter to message. During a function call process, you must specify the tools parameter regardless of whether you initiate a round of function call or submit the results of a tool function to the model. Supported models include qwen-turbo, qwen-plus, and qwen-max. Note The tools parameter cannot be used together with the incremental_output parameter. |
Sample responses
The sample response when the result_format parameter is set to message:
{ "status_code": 200, "request_id": "75824057-7214-9b15-a701-004d88337def", "code": "", "message": "", "output": { "text": null, "finish_reason": null, "choices": [ { "finish_reason": "stop", "message": { "role": "assistant", "content": "Nice to meet you too! If you have any questions or need assistance, feel free to ask, and I'll do my best to help you." } } ] }, "usage": { "input_tokens": 92, "output_tokens": 30, "total_tokens": 122 } }
The sample response when a function call is initiated:
{ "status_code": 200, "request_id": "a2b49cd7-ce21-98ff-87ac-b00cc590dc5e", "code": "", "message": "", "output": { "text": null, "finish_reason": null, "choices": [ { "finish_reason": "tool_calls", "message": { "role": "assistant", "content": "", "tool_calls":[ { 'function': { 'name': 'get_current_weather', 'arguments': '{"properties": {"location": "Beijing"}}' }, 'id': '', 'type': 'function'}] } } ] }, "usage": { "input_tokens": 12, "output_tokens": 98, "total_tokens": 110 } }
Response parameters
Parameter
Type
Description
Note
status_code
integer
The response code. The status code 200 indicates that the request is successful. Other status codes indicate that the request failed. If the request failed, the corresponding error code and error message are returned for the code and message parameters.
NoteThis parameter is returned only in Python. If a request failed in Java, an error is reported and the error code and error message are returned for the code and message parameters.
request_id
string
The request ID.
code
string
The error code that is returned if the request failed. If the request was successful, no value is returned for this parameter. This parameter is returned only in Python.
message
string
The error message that is returned if the request failed. If the request was successful, no value is returned for this parameter. This parameter is returned only in Python.
output
object
The returned results.
output.text
string
The answer that is generated by the model.
A value is returned if the prompt parameter is specified.
output.finish_reason
string
The reason why the model stops generating the answer. Valid values:
null: The model is generating the answer.
stop: The content generated by the model triggers the stop conditions.
length: The content generated by the model is excessively long.
tool_calls: A tool is called during content generation.
output.choices
array
The choices that are returned if the result_format parameter is set to message.
If the result_format parameter is set to message, choices is returned.
output.choices[i].finish_reason
string
The reason why the model stops generating the answer. Valid values:
null: The model is generating the answer.
stop: The content generated by the model triggers the stop conditions.
length: The content generated by the model is excessively long.
output.choices[i].message
object
The message returned by the model.
output.choices[i].message.role
string
The role of the model. Only assistant can be returned.
output.choices[i].message.content
string
The content generated by the model.
output.choices[i].message.tool_calls
object
The tool_calls parameter that is returned if the model needs to call a tool. This parameter is used when a tool is called.
A tool contains the type, function, and id parameters. For more information, see the Sample responses section of this topic. The following list describes the type and function parameter:
type: the type of the tool. The value of this parameter is a string. Only function may be returned.
function: the function of the tool, including the name and arguments parameters. The value of this parameter is an object.
name: the name of the tool to be called. In function call scenarios, the value indicates the name of the tool function to be called.
arguments: the request parameters of the tool to be passed during content generation. You can parse the value of the arguments parameter to a dictionary by using the json.loads method in Python.
usage
object
The number of tokens that are consumed during the request.
usage.input_tokens
integer
The number of tokens that are converted from the input text.
For more information about how to calculate tokens, see the Convert strings into tokens and convert tokens back into strings section of the Billing topic.
usage.output_tokens
integer
The number of tokens that are converted from the answer generated by the model.
usage.total_tokens
integer
The total number of tokens that are converted from the input text and tokens that are converted from the answer generated by the model.
Use HTTP
Overview
You can call API operations over HTTP to use Qwen models. This eliminates the need to install the SDK. The HTTP and HTTP Server-Sent Events (SSE) protocols are supported. You can send requests over one of the protocols based on your business requirements.
Prerequisites
Alibaba Cloud Model Studio is activated and an API key is created. For more information, see Obtain an API key.
Request syntax
POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
Request parameters
The following table describes the request parameters. The Type column contains the following data types:
string
array
integer
float
boolean
object: the hash table.
Component | Parameter | Type | Description | Example |
Header | Content-Type | string | The request type. Set the value to application/json. | "Content-Type":"application/json" |
Accept | string | Optional. Specifies whether to enable SSE. If you set this parameter to text/event-stream, SSE is enabled. By default, this parameter is left empty. | "Accept":"text/event-stream" | |
Authorization | string | The API key. | "Authorization":"Bearer d1**2a" | |
X-DashScope-WorkSpace | String | Optional. The name of the workspace to be used for this call. This parameter is required if the API key of a Resource Access Management (RAM) user is used. In addition, the specified workspace must contain the RAM user. This parameter is optional if the API key of an Alibaba Cloud account is used. If you specify a workspace, the corresponding identity in the workspace is used. If you leave this parameter empty, the identity of the Alibaba Cloud account is used. | ws_QTggmeAxxxxx | |
X-DashScope-SSE | string | Optional. Specifies whether to enable SSE. You can set this parameter to enable or the Accept parameter to text/event-stream to enable SSE. | "X-DashScope-SSE":"enable" | |
Body | model | string | The name of the Qwen model to be called for conversations. Valid values: qwen-turbo, qwen-plus, qwen-max. | "model":"qwen-turbo" |
input | object | The information that you enter for the model. | ||
input.prompt | string | Optional. The prompt that you want the model to execute. You can enter a prompt in Chinese or English. You can specify one of the input.messages and input.prompt parameters. Note A period (.) in the parameter name indicates that the information after the period is the attribute of the information before the period. In the API testing tool, you cannot set the key to input.prompt. You can specify this parameter in the following format: "input":{"prompt":"xxx"}. | "input":{"prompt":"Hello"} | |
input.history | array | This parameter will be discontinued. We recommend that you use the input.messages parameter. Optional. The conversation history between the user and the model. Each element in the array specifies a round of conversation. Specify each element in the following format: {"user":"User question","bot":"Answer generated by the model"}. Specify multiple rounds of conversations in chronological order. | "input":{"history":[{"user":"How is the weather today?", "bot":"It is a nice day. Do you want to go out?"}, {"user":"What do you recommend?", "bot":"I suggest that you go to the park. Spring is coming and the flowers are blooming. It is very beautiful."}]} | |
input.messages | array | Optional. The conversation history between the user and the model. Specify each element in the array in the following format: {"role": Role, "content": Content}. For example, if the role parameter is set to tool, specify each element in the array in the following format:
Valid values of the role parameter: system, user, assistant, and tool. These parameters are required if the input.messages parameter is specified. | "input":{ "messages":[ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello, where is the museum nearby?" }] } | |
input.messages.role | string | |||
input.messages.content | string | |||
input.messages.name | string | Optional. If the role parameter is set to tool, the messages are the results of the function call. name specifies the function name of the tool, which must be the same as the value of the tool_calls[i].function.name parameter that is returned in the previous response. content specifies the output generated by the tool. For more information, see the sample code in the Function calling section of this topic. This parameter is required if the input.messages.role parameter is set to tool. | ||
parameters | object | Optional. The parameters used to control the content generated by the model. | ||
parameters.result_format | string | Optional. The output format of the response. Default value: text. You can also set this parameter to message. For more information about the message format, see the Sample responses section of this topic. We recommend that you set the output format to message. | "parameters":{"result_format":"message"} | |
parameters.seed | integer | Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model. Valid values: 64-bit unsigned integers. Default value: 1234. If you specify seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call. | "parameters":{"seed":666} | |
parameters.max_tokens | integer | Optional. The maximum number of tokens that can be generated by the model.
| "parameters":{"max_tokens":1500} | |
parameters.top_p | float | Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content. Valid values: (0,1.0). Default value: 0.8. | "parameters":{"top_p":0.7} | |
parameters.top_k | integer | Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content. By default, the top_k parameter is left empty. If the top_k parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect. | "parameters":{"top_k":50} | |
parameters.repetition_penalty | float | Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty. No valid values are specified for this parameter. Default value: 1.1. | "parameters":{"repetition_penalty":1.0} | |
parameters.temperature | float | Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable. Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless. Default value: 0.85. | "parameters":{"temperature":0.85} | |
parameters.stop | string/array | Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. The value of the stop parameter can be a string or an array.
Note If the stop parameter is an array, the array cannot contain both token IDs and strings. For example, you cannot set the stop parameter to | "parameters":{"stop":["Hello","Weather"]} | |
parameters.enable_search | boolean | Optional. Specifies whether to enable the Internet search feature for reference during content generation. Valid values:
| "parameters":{"enable_search":false} | |
parameters.incremental_output | boolean | Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content. Examples:
This parameter takes effect only if SSE is enabled. Default value: False. Note The incremental_output parameter cannot be used together with the tools parameter. | "parameters":{"incremental_output":false} | |
parameters.tools | array | Optional. A list of tools that can be called by the model. The model calls a tool from the tool list during each function call process. A tool in the tool list contains the following parameters:
To use the tools parameter, you must set the result_format parameter to message. During a function call process, you must specify the tools parameter regardless of whether you initiate a round of function call or submit the results of a tool function to the model. Supported models include qwen-turbo, qwen-plus, qwen-max, and qwen-max-longcontext. Note The tools parameter cannot be used together with the incremental_output parameter. |
|
Response parameters
Parameter | Type | Description | Example |
output.text | string | The output content returned by the model. A value is returned for this parameter if the result_format parameter is set to text. | I suggest that you go to the Summer Palace. |
output.finish_reason | string | The reason why the model stops generating the answer. Valid values:
A value is returned for this parameter if the result_format parameter is set to text. | stop |
output.choices | array | The choices that are returned if the result_format parameter is set to message. |
|
output.choices[x].finish_reason | string | The reason why the model stops generating the answer. Valid values: null: The model is generating the answer.
| |
output.choices[x].message | object | Each message is displayed in the following format: {"role": Role, "content": Content}. Valid values of the role parameter: content indicates the output content returned by the model. | |
output.choices[x].message.role | string | ||
output.choices[x].message.content | string | ||
output.choices[x].message.tool_calls | object | The tool_calls parameter that is returned if the model needs to call a tool. This parameter is used when a tool is called. A tool contains the type and function parameters. For more information, see the sample code in the Function calling section of this topic The following list describes the parameters:
| |
usage | object | The number of tokens that are consumed during the model call. | |
usage.output_tokens | integer | The number of tokens that are converted from the answer generated by the model. | 380 |
usage.input_tokens | integer | The number of tokens that are converted from the input content. If you set the enable_search parameter to true, the value of this parameter is greater than the number of tokens that are converted from the actual input content. This is because more tokens are converted from the information queried by the Internet search feature. | 633 |
usage.total_tokens | integer | The total number of tokens that are converted from the input text and tokens that are converted from the answer generated by the model. | 1013 |
request_id | string | The request ID. | 7574ee8f-38a3-4b1e-9280-11c33ab46e51 |
Sample requests with SSE disabled
The following sample code provides examples on how to run the cURL command or a Python script to call a Qwen model when SSE is disabled:
You must replace $your-dashscope-api-key in the sample code with your API key.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Authorization: Bearer $your-dashscope-api-key' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-turbo",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, which park is closest to me?"
}
]
},
"parameters": {
"result_format": "message"
}
}'
import requests
url = 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation'
headers = {'Content-Type': 'application/json',
'Authorization':'Bearer $your-dashscope-api-key'}
body = {
'model': 'qwen-turbo',
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, which park is closest to me?"
}]
},
}
response = requests.post(url, headers=headers, json=body)
print(response.text)
Sample responses with SSE disabled
Response when the result_format parameter is set to text
{
"output":{
"text":"If you are in China, I suggest that you go to the Summer Palace in Beijing... for walking and enjoying the scenery.",
"finish_reason":"stop"
},
"usage":{
"output_tokens":380,
"input_tokens":633
},
"request_id":"d89c06fb-46a1-47b6-acb9-bfb17f814969"
}
Response when the result_format parameter is set to message
{
"output":{
"text":"If you are in China, I suggest that you go to the Summer Palace in Beijing... for walking and enjoying the scenery.",
"finish_reason":"stop"
},
"usage":{
"output_tokens":380,
"input_tokens":633
},
"request_id":"d89c06fb-46a1-47b6-acb9-bfb17f814969"
}
Response when a tool is called
{
"status_code": 200,
"request_id": "a2b49cd7-ce21-98ff-87ac-b00cc590dc5e",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"content": "",
"tool_calls":[
{
'function': {
'name': 'get_current_weather',
'arguments': '{"properties": {"location": "Beijing"}}'
},
'id': '',
'type': 'function'}]
}
}
]
},
"usage": {
"input_tokens": 12,
"output_tokens": 98,
"total_tokens": 110
}
}
Sample requests with SSE enabled
The following sample code provides examples on how to run the cURL command or a Python script to call a Qwen model when SSE is enabled. Output content is returned in a way that is similar to the streaming output mode.
You must replace $your-dashscope-api-key in the sample code with your API key.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Authorization: Bearer $your-dashscope-api-key' \
--header 'Content-Type: application/json' \
--header 'X-DashScope-SSE: enable' \
--data '{
"model": "qwen-turbo",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, which park is closest to me?"
}
]
},
"parameters": {
"result_format": "message",
"incremental_output":true
}
}'
import requests
url = 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation'
headers = {'Content-Type': 'application/json',
'Authorization':'Bearer $your-dashscope-api-key',
'X-DashScope-SSE': 'enable'}
body = {
'model': 'qwen-turbo',
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello"
}]
},
'parameters':{'incremental_output':True}
}
response = requests.post(url, headers=headers, json=body)
print(response.text)
Sample responses with SSE enabled
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Hello","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":28,"input_tokens":27,"output_tokens":1},"request_id":"c13ac6fc-9281-9ac4-9f1d-003a38c48e02"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":29,"input_tokens":27,"output_tokens":2},"request_id":"c13ac6fc-9281-9ac4-9f1d-003a38c48e02"}
... ... ... ...
... ... ... ...
id:12
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":91,"input_tokens":27,"output_tokens":64},"request_id":"c13ac6fc-9281-9ac4-9f1d-003a38c48e02"}
Sample error responses
If an error occurs during a request, the error code and error message are returned for the code and message parameters.
{
"code":"InvalidApiKey",
"message":"Invalid API-key provided.",
"request_id":"fb53c4ec-1c12-4fc4-a580-cdb7c3261fc1"
}