In scenarios such as intelligent customer service and role-playing, a model must use previous interactions to generate coherent responses. The Qwen API is stateless and does not automatically save the conversation state. To implement a multi-turn conversation, you must pass the conversation history with each request. This topic explains how to create multi-turn conversations and provides methods for managing the conversation context and reducing token consumption.
How it works
To implement a multi-turn conversation, maintain a messages array. In each round, append the user's latest question and the model's response to this array. Then, use the updated array as the input for the next request.
The following example shows how the state of the messages array changes during a multi-turn conversation:
First round
Add the user's question to the
messagesarray.// Use a text model [ {"role": "user", "content": "Recommend a sci-fi movie about space exploration."} ] // Use a multimodal model, for example, Qwen-VL // {"role": "user", // "content": [{"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"}}, // {"type": "text", "text": "What products are shown in the image?"}] // }Second round
Add the model's response and the user's latest question to the
messagesarray.// Use a text model [ {"role": "user", "content": "Recommend a sci-fi movie about space exploration."}, {"role": "assistant", "content": "I recommend 'XXX'. It is a classic sci-fi work."}, {"role": "user", "content": "Who is the director of this movie?"} ] // Use a multimodal model, for example, Qwen-VL //[ // {"role": "user", "content": [ // {"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"}}, // {"type": "text", "text": "What products are shown in the image?"}]}, // {"role": "assistant", "content": "The image shows three items: a pair of light blue overalls, a blue and white striped short-sleeve shirt, and a pair of white sneakers."}, // {"role": "user", "content": "What style are they?"} //]
Getting started
OpenAI compatible
Python
import os
from openai import OpenAI
def get_response(messages):
client = OpenAI(
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
completion = client.chat.completions.create(model="qwen-plus", messages=messages)
return completion
# Initialize a messages array
messages = [
{
"role": "system",
"content": """You are a salesperson at the Bailian mobile phone store. You are responsible for recommending phones to users. The phones have two parameters: screen size (including 6.1-inch, 6.5-inch, and 6.7-inch) and resolution (including 2K and 4K).
You can only ask the user for one parameter at a time. If the user does not provide complete information, you need to ask a follow-up question to get the missing parameter. When all parameters are collected, you must say: I have understood your purchase intention. Please wait.""",
}
]
assistant_output = "Welcome to the Bailian mobile phone store. What screen size are you looking for?"
print(f"Model output: {assistant_output}\n")
while "I have understood your purchase intention" not in assistant_output:
user_input = input("Please enter: ")
# Add the user's question to the messages list
messages.append({"role": "user", "content": user_input})
assistant_output = get_response(messages).choices[0].message.content
# Add the model's response to the messages list
messages.append({"role": "assistant", "content": assistant_output})
print(f"Model output: {assistant_output}")
print("\n")Node.js
import OpenAI from "openai";
import { createInterface } from 'readline/promises';
// The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1
const BASE_URL = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1";
// API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: BASE_URL
});
async function getResponse(messages) {
try {
const completion = await openai.chat.completions.create({
// For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
model: "qwen-plus",
messages: messages,
});
return completion.choices[0].message.content;
} catch (error) {
console.error("Error fetching response:", error);
throw error; // Re-throw the exception for handling by the upper layer
}
}
// Initialize the messages array
const messages = [
{
"role": "system",
"content": `You are a salesperson at the Bailian mobile phone store. You are responsible for recommending phones to users. The phones have two parameters: screen size (including 6.1-inch, 6.5-inch, and 6.7-inch) and resolution (including 2K and 4K).
You can only ask the user for one parameter at a time. If the user does not provide complete information, you need to ask a follow-up question to get the missing parameter. When all parameters are collected, you must say: I have understood your purchase intention. Please wait.`,
}
];
let assistant_output = "Welcome to the Bailian mobile phone store. What screen size are you looking for?";
console.log(assistant_output);
const readline = createInterface({
input: process.stdin,
output: process.stdout
});
(async () => {
while (!assistant_output.includes("I have understood your purchase intention")) {
const user_input = await readline.question("Please enter: ");
messages.push({ role: "user", content: user_input});
try {
const response = await getResponse(messages);
assistant_output = response;
messages.push({ role: "assistant", content: assistant_output });
console.log(assistant_output);
console.log("\n");
} catch (error) {
console.error("An error occurred while fetching the response:", error);
}
}
readline.close();
})();curl
The URL in the sample code is for the Singapore region. To use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions.
# ======= Important =======
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello, I am Qwen."
},
{
"role": "user",
"content": "What can you do?"
}
]
}'DashScope
Python
The sample code uses the example of a mobile phone store salesperson. The salesperson engages in a multi-turn conversation with a customer to determine their purchase intentions and then ends the session.
import os
from dashscope import Generation
import dashscope
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def get_response(messages):
response = Generation.call(
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
model="qwen-plus",
messages=messages,
result_format="message",
)
return response
messages = [
{
"role": "system",
"content": """You are a salesperson at the Bailian mobile phone store. You are responsible for recommending phones to users. The phones have two parameters: screen size (including 6.1-inch, 6.5-inch, and 6.7-inch) and resolution (including 2K and 4K).
You can only ask the user for one parameter at a time. If the user does not provide complete information, you need to ask a follow-up question to get the missing parameter. When all parameters are collected, you must say: I have understood your purchase intention. Please wait.""",
}
]
assistant_output = "Welcome to the Bailian mobile phone store. What screen size are you looking for?"
print(f"Model output: {assistant_output}\n")
while "I have understood your purchase intention" not in assistant_output:
user_input = input("Please enter: ")
# Add the user's question to the messages list
messages.append({"role": "user", "content": user_input})
assistant_output = get_response(messages).output.choices[0].message.content
# Add the model's response to the messages list
messages.append({"role": "assistant", "content": assistant_output})
print(f"Model output: {assistant_output}")
print("\n")Java
import java.util.ArrayList;
import java.util.List;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import java.util.Scanner;
import com.alibaba.dashscope.protocol.Protocol;
public class Main {
public static GenerationParam createGenerationParam(List<Message> messages) {
return GenerationParam.builder()
// API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
.model("qwen-plus")
.messages(messages)
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.build();
}
public static GenerationResult callGenerationWithMessages(GenerationParam param) throws ApiException, NoApiKeyException, InputRequiredException {
// The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/api/v1
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
return gen.call(param);
}
public static void main(String[] args) {
try {
List<Message> messages = new ArrayList<>();
messages.add(createMessage(Role.SYSTEM, "You are a helpful assistant."));
for (int i = 0; i < 3;i++) {
Scanner scanner = new Scanner(System.in);
System.out.print("Please enter: ");
String userInput = scanner.nextLine();
if ("exit".equalsIgnoreCase(userInput)) {
break;
}
messages.add(createMessage(Role.USER, userInput));
GenerationParam param = createGenerationParam(messages);
GenerationResult result = callGenerationWithMessages(param);
System.out.println("Model output: "+result.getOutput().getChoices().get(0).getMessage().getContent());
messages.add(result.getOutput().getChoices().get(0).getMessage());
}
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
e.printStackTrace();
}
System.exit(0);
}
private static Message createMessage(Role role, String content) {
return Message.builder().role(role.getValue()).content(content).build();
}
}curl
The URL in the sample code is for the Singapore region. To use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation.
# ======= Important =======
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello, I am Qwen."
},
{
"role": "user",
"content": "What can you do?"
}
]
}
}'For multimodal models
This section applies to the
Qwen-VLand models. For information about implementingQwen-Omni, see omni-modal.Qwen-VL-OCRandQwen3-Omni-Captionerare designed for specific single-turn tasks and do not support multi-turn conversations.
Multimodal models support adding content such as images and audio to conversations. The implementation of multi-turn conversations for these models differs from text models in the following ways:
Construction of user messages: User messages for multimodal models can contain multimodal information, such as images and audio, in addition to text.
DashScope SDK interface: When using the DashScope Python SDK, call the
MultiModalConversationinterface. When using the DashScope Java SDK, use theMultiModalConversationclass.
OpenAI compatible
Python
from openai import OpenAI
import os
client = OpenAI(
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
messages = [
{"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"
},
},
{"type": "text", "text": "What products are shown in the image?"},
],
}
]
completion = client.chat.completions.create(
model="qwen3-vl-plus", # You can replace this with other multimodal models and modify the messages as needed
messages=messages,
)
print(f"First round output: {completion.choices[0].message.content}")
assistant_message = completion.choices[0].message
messages.append(assistant_message.model_dump())
messages.append({
"role": "user",
"content": [
{
"type": "text",
"text": "What style are they?"
}
]
})
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=messages,
)
print(f"Second round output: {completion.choices[0].message.content}")Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
let messages = [
{
role: "user", content: [
{ type: "image_url", image_url: { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png" } },
{ type: "text", text: "What products are shown in the image?" },
]
}]
async function main() {
let response = await openai.chat.completions.create({
model: "qwen3-vl-plus", // You can replace this with other multimodal models and modify the messages as needed
messages: messages
});
console.log(`First round output: ${response.choices[0].message.content}`);
messages.push(response.choices[0].message);
messages.push({"role": "user", "content": "Write a poem describing this scene"});
response = await openai.chat.completions.create({
model: "qwen3-vl-plus",
messages: messages
});
console.log(`Second round output: ${response.choices[0].message.content}`);
}
main()curl
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"
}
},
{
"type": "text",
"text": "What products are shown in the image?"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "The image shows three items: a pair of light blue overalls, a blue and white striped short-sleeve shirt, and a pair of white sneakers."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What style are they?"
}
]
}
]
}'DashScope
Python
import os
from dashscope import MultiModalConversation
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"
},
{"text": "What products are shown in the image?"},
],
}
]
response = MultiModalConversation.call(
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus', # You can replace this with other multimodal models and modify the messages as needed
messages=messages
)
print(f"Model first round output {response.output.choices[0].message.content[0]['text']}")
messages.append(response['output']['choices'][0]['message'])
user_msg = {"role": "user", "content": [{"text": "What style are they?"}]}
messages.append(user_msg)
response = MultiModalConversation.call(
# If the environment variable is not configured, please replace the following line with: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus',
messages=messages
)
print(f"Model second round output {response.output.choices[0].message.content[0]['text']}")Java
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final String modelName = "qwen3-vl-plus"; // You can replace this with other multimodal models and modify the messages as needed
public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"),
Collections.singletonMap("text", "What products are shown in the image?"))).build();
List<MultiModalMessage> messages = new ArrayList<>();
messages.add(userMessage);
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(modelName)
.messages(messages)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("First round output: "+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); // add the result to conversation
messages.add(result.getOutput().getChoices().get(0).getMessage());
MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "What style are they?"))).build();
messages.add(msg);
param.setMessages((List)messages);
result = conv.call(param);
System.out.println("Second round output: "+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); }
public static void main(String[] args) {
try {
MultiRoundConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"},
{"text": "What products are shown in the image?"}
]
},
{
"role": "assistant",
"content": [
{"text": "The image shows three items: a pair of light blue overalls, a blue and white striped short-sleeve shirt, and a pair of white sneakers."}
]
},
{
"role": "user",
"content": [
{"text": "What style are they?"}
]
}
]
}
}'For thinking models
Thinking models return two fields: reasoning_content (the thinking process) and content (the response). When updating the messages array, retain only the content field and ignore the reasoning_content field.
[
{"role": "user", "content": "Recommend a sci-fi movie about space exploration."},
{"role": "assistant", "content": "I recommend 'XXX'. It is a classic sci-fi work."}, # Do not add the reasoning_content field when adding to the context
{"role": "user", "content": "Who is the director of this movie?"}
]For more information about thinking models, see Deep thinking, Visual understanding, and Visual reasoning.
For more information about implementing multi-turn conversations with Qwen3-Omni-Flash (thinking mode), see omni-modal.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
messages = []
conversation_idx = 1
while True:
reasoning_content = "" # Define the complete thinking process
answer_content = "" # Define the complete response
is_answering = False # Determine whether to end the thinking process and start responding
print("="*20+f"Conversation Round {conversation_idx}"+"="*20)
conversation_idx += 1
user_msg = {"role": "user", "content": input("Enter your message: ")}
messages.append(user_msg)
# Create a chat completion request
completion = client.chat.completions.create(
# You can replace this with other deep thinking models as needed
model="qwen-plus",
messages=messages,
extra_body={"enable_thinking": True},
stream=True,
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print the thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start responding
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print the response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# Add the content of the model's response to the context
messages.append({"role": "assistant", "content": answer_content})
print("\n")Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
import readline from 'readline/promises';
// Initialize the readline interface
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
// Initialize the openai client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variables
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [];
let conversationIdx = 1;
async function main() {
while (true) {
console.log("=".repeat(20) + `Conversation Round ${conversationIdx}` + "=".repeat(20));
conversationIdx++;
// Read user input
const userInput = await rl.question("Enter your message: ");
messages.push({ role: 'user', content: userInput });
// Reset state
reasoningContent = '';
answerContent = '';
isAnswering = false;
try {
const stream = await openai.chat.completions.create({
// You can replace this with other deep thinking models as needed
model: 'qwen-plus',
messages: messages,
enable_thinking: true,
stream: true,
// stream_options:{
// include_usage: true
// }
});
console.log("\n" + "=".repeat(20) + "Thinking Process" + "=".repeat(20) + "\n");
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process the thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process the formal response
if (delta.content) {
if (!isAnswering) {
console.log('\n' + "=".repeat(20) + "Complete Response" + "=".repeat(20) + "\n");
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
// Add the complete response to the message history
messages.push({ role: 'assistant', content: answerContent });
console.log("\n");
} catch (error) {
console.error('Error:', error);
}
}
}
// Start the program
main().catch(console.error);HTTP
Sample code
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello! Nice to meet you. Is there anything I can help you with?"
},
{
"role": "user",
"content": "Who are you?"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'DashScope
Python
Sample code
import os
import dashscope
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
messages = []
conversation_idx = 1
while True:
print("=" * 20 + f"Conversation Round {conversation_idx}" + "=" * 20)
conversation_idx += 1
user_msg = {"role": "user", "content": input("Enter your message: ")}
messages.append(user_msg)
response = dashscope.Generation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
# This example uses qwen-plus. You can replace it with other deep thinking models as needed
model="qwen-plus",
messages=messages,
enable_thinking=True,
result_format="message",
stream=True,
incremental_output=True
)
# Define the complete thinking process
reasoning_content = ""
# Define the complete response
answer_content = ""
# Determine whether to end the thinking process and start responding
is_answering = False
print("=" * 20 + "Thinking Process" + "=" * 20)
for chunk in response:
# If both the thinking process and the response are empty, ignore
if (chunk.output.choices[0].message.content == "" and
chunk.output.choices[0].message.reasoning_content == ""):
pass
else:
# If it is currently the thinking process
if (chunk.output.choices[0].message.reasoning_content != "" and
chunk.output.choices[0].message.content == ""):
print(chunk.output.choices[0].message.reasoning_content, end="",flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If it is currently the response
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="",flush=True)
answer_content += chunk.output.choices[0].message.content
# Add the content of the model's response to the context
messages.append({"role": "assistant", "content": answer_content})
print("\n")
# To print the complete thinking process and complete response, uncomment and run the following code
# print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")Java
Sample code
// DashScope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.List;
import com.alibaba.dashscope.protocol.Protocol;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
if (message != null && message.getOutput() != null
&& message.getOutput().getChoices() != null
&& !message.getOutput().getChoices().isEmpty()
&& message.getOutput().getChoices().get(0) != null
&& message.getOutput().getChoices().get(0).getMessage() != null) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (reasoning != null && !reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (content != null && !content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
}
private static GenerationParam buildGenerationParam(List<Message> messages) {
return GenerationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// This example uses qwen-plus. You can replace it with other model names as needed.
.model("qwen-plus")
.enableThinking(true)
.messages(messages)
.incrementalOutput(true)
.resultFormat("message")
.build();
}
public static void streamCallWithMessage(Generation gen, List<Message> messages)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(messages);
Flowable<GenerationResult> result = gen.streamCall(param);
result.doOnError(throwable -> logger.error("Error occurred in stream processing: {}", throwable.getMessage(), throwable))
.blockingForEach(Main::handleGenerationResult);
}
public static void main(String[] args) {
try {
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
Message userMsg1 = Message.builder()
.role(Role.USER.getValue())
.content("Hello")
.build();
Message assistantMsg = Message.builder()
.role(Role.ASSISTANT.getValue())
.content("Hello! Nice to meet you. Is there anything I can help you with?")
.build();
Message userMsg2 = Message.builder()
.role(Role.USER.getValue())
.content("Who are you")
.build();
List<Message> messages = Arrays.asList(userMsg1, assistantMsg, userMsg2);
streamCallWithMessage(gen, messages);
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage(), e);
} catch (Exception e) {
logger.error("Unexpected error occurred: {}", e.getMessage(), e);
} finally {
// Ensure the program exits normally
System.exit(0);
}
}
}HTTP
Sample code
curl
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello! Nice to meet you. Is there anything I can help you with?"
},
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'Going live
Multi-turn conversations can consume many tokens and may exceed the model's maximum context window, which can lead to errors. Use the following strategies to effectively manage context and control costs.
1. Context management
The messages array grows with each conversation round and may eventually exceed the model's token limit. Use the following methods to manage the context length during a conversation.
1.1. Context truncation
When the conversation history becomes too long, keep only the most recent N rounds of conversation. This method is simple to implement but results in the loss of earlier conversation information.
1.2. Rolling summary
To dynamically compress the conversation history and control the context length without losing core information, summarize the context as the conversation progresses:
a. When the conversation history reaches a certain length, such as 70% of the maximum context length, extract an earlier part of the history, such as the first half. Then, make a separate API call to have the model generate a "memory summary" of this part.
b. When constructing the next request, replace the lengthy conversation history with the "memory summary" and append the most recent conversation rounds.
1.3. Vectorized retrieval
A rolling summary can cause some information loss. To allow the model to recall relevant information from a large volume of conversation history, switch from linear context passing to on-demand retrieval:
a. After each conversation round, store the conversation in a vector store.
b. When a user asks a question, retrieve relevant conversation records based on similarity.
c. Combine the retrieved conversation records with the most recent user input and send the combined content to the model.
2. Cost control
The number of input tokens increases with each conversation round, which significantly raises costs. Use the following cost management strategies.
2.1. Reduce input tokens
Use the context management strategies described previously to reduce input tokens and lower costs.
2.2. Use models that support context cache
When you make a multi-turn conversation request, the content in the messages array is repeatedly processed and billed. Alibaba Cloud Model Studio provides the context cache feature for models such as qwen-max and qwen-plus. This feature can reduce costs and improve response speed. We recommend that you prioritize using models that support context cache.
The context cache feature is enabled automatically. No code changes are required.
Error codes
If a call fails, see Error messages for troubleshooting.