Deep thinking models first outputs reasoning steps before generating a final answer, making them more accurate on tasks such as logic and computation. This topic describes how to call deep thinking models such as Qwen and DeepSeek.
Implementation guide
Alibaba Cloud Model Studio provides APIs for various deep thinking models, including hybrid-thinking and thinking-only modes.
Hybrid-thinking: Use the
enable_thinkingparameter to control whether to think:Set to
true: The model thinks before replying.Set to
false: The model replies directly.
OpenAI compatible
# Import dependencies and create a client... completion = client.chat.completions.create( model="qwen-plus", # Select a model messages=[{"role": "user", "content": "Who are you"}], # Because enable_thinking is not a standard OpenAI parameter, pass it through extra_body extra_body={"enable_thinking":True}, # Call with streaming output stream=True, # Make the last packet of the stream response contain token usage information stream_options={ "include_usage": True } )DashScope
# Import dependencies... response = Generation.call( # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx", api_key=os.getenv("DASHSCOPE_API_KEY"), # You can replace this with another deep thinking model as needed model="qwen-plus", messages=messages, result_format="message", enable_thinking=True, stream=True, incremental_output=True )Thinking-only: The model always thinks before replying, and this cannot be disabled. The request format is the same as for the hybrid-thinking mode, except you do not need to set the enable_thinking parameter.
The thinking process is returned in the reasoning_content field. The reply is returned in the content field. Because deep thinking models have a longer response time and most of them only support streaming output, the sample code in this topic all uses streaming output.
Model availability
Qwen3
Commercial edition
Qwen-Max series (hybrid-thinking mode, disabled by default): qwen3-max-preview
Qwen-Plus series (hybrid-thinking mode, disabled by default): qwen-plus, qwen-plus-latest, qwen-plus-2025-04-28, and later snapshot models
Qwen-Flash series (hybrid-thinking mode, disabled by default): qwen-flash, qwen-flash-2025-07-28, and later snapshot models
Qwen-Turbo series (hybrid-thinking mode, disabled by default): qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28, and later snapshot models
Open-source edition
Hybrid-thinking mode, enabled by default: qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b
Thinking-only mode: qwen3-next-80b-a3b-thinking, qwen3-235b-a22b-thinking-2507, qwen3-30b-a3b-thinking-2507
QwQ (based on Qwen2.5)
Thinking-only mode: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05, qwq-32b
DeepSeek (Beijing region)
Hybrid-thinking mode, disabled by default: deepseek-v3.2, deepseek-v3.2-exp, deepseek-v3.1
Thinking-only mode: deepseek-r1, deepseek-r1-0528, deepseek-r1 distilled model
Kimi (Beijing region)
Thinking-only mode: kimi-k2-thinking
For more information about model names, context, prices, and snapshot versions, see Models. For more information about rate limiting, see Rate limits.
Getting started
Prerequisites: You have created an API key and export the API key as an environment variable. If you use an SDK, install the OpenAI or DashScope SDK. The DashScope Java SDK must be version 2.19.4 or later.
Run the following code to call the qwen-plus model in thinking mode with streaming output.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If you have not configured the environment variable, replace the following with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="qwen-plus", # You can replace this with another deep thinking model as needed
messages=messages,
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full reply
is_answering = False # Indicates whether the reply phase has started
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the thinking content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# When content is received, start replying
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full Reply" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Response
====================Thinking Process====================
Okay, the user is asking "Who are you". I need to give an accurate and friendly answer. First, I must confirm my identity, which is Qwen, developed by the Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions, creating text, and logical reasoning. I should also maintain a friendly tone and avoid being too technical to make the user feel at ease. I must also be careful not to use complex terms and ensure the answer is concise and clear. Additionally, I might need to add some interactive elements, inviting the user to ask questions to encourage further communication. Finally, I will check if I have missed any important information, such as my Chinese name 'Tongyi Qianwen' and English name 'Qwen', along with my parent company and lab. I need to ensure the answer is comprehensive and meets the user's expectations.
====================Full Reply====================
Hello! I am Qwen, an ultra-large language model independently developed by the Tongyi Lab under Alibaba Group. I can answer questions, create text, perform logical reasoning, and code, aiming to provide users with high-quality information and services. You can call me Qwen, or just Tongyi Qianwen. How can I help you?Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
stream: true,
enable_thinking: true
});
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the thinking content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// When content is received, start replying
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full Reply' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Response
====================Thinking Process====================
Okay, the user is asking "Who are you". I need to answer about my identity. First, I should clearly state that I am Qwen, an ultra-large language model developed by Alibaba Cloud. Next, I can mention my main functions, such as answering questions, creating text, and logical reasoning. I should also emphasize my multilingual support, including Chinese and English, so the user knows I can handle requests in different languages. Additionally, I might need to explain my application scenarios, such as helping with study, work, and daily life. However, the user's question is quite direct, so I probably don't need too much detailed information. I should keep it concise and clear. At the same time, I need to ensure a friendly tone and invite the user to ask further questions. I should check if I have missed any important information, such as my version or latest updates, but the user probably doesn't need that much detail. Finally, I will confirm that the answer is accurate and contains no errors.
====================Full Reply====================
I am Qwen, an ultra-large language model independently developed by the Tongyi Lab under Alibaba Group. I can perform various tasks such as answering questions, creating text, logical reasoning, and coding. I support multiple languages, including Chinese and English. If you have any questions or need help, feel free to let me know!HTTP
Sample code
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'Response
data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
.....
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: [DONE]DashScope
Python
Sample code
import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message",
enable_thinking=True,
stream=True,
incremental_output=True,
)
# Define the complete thinking process.
reasoning_content = ""
# Define the complete response.
answer_content = ""
# Check if the thinking process is finished and the response has started.
is_answering = False
print("=" * 20 + "Thinking Process" + "=" * 20)
for chunk in completion:
# If both the thinking process and the response are empty, ignore.
if (
chunk.output.choices[0].message.content == ""
and chunk.output.choices[0].message.reasoning_content == ""
):
pass
else:
# If it is currently the thinking process.
if (
chunk.output.choices[0].message.reasoning_content != ""
and chunk.output.choices[0].message.content == ""
):
print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If it is currently the response.
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="", flush=True)
answer_content += chunk.output.choices[0].message.content
# To print the complete thinking process and response, uncomment and run the following code.
# print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Response
====================Thinking Process====================
Okay, the user is asking, "Who are you?" I need to answer this question. First, I must clarify my identity: I am Qwen, a large-scale language model developed by Alibaba Cloud. Next, I need to explain my functions and uses, such as answering questions, creating text, and logical reasoning. I should also emphasize that my goal is to be a helpful assistant to the user, providing help and support.
When responding, I should maintain a conversational tone and avoid technical jargon or complex sentences. I can add friendly phrases, like "Hello there!~", to make the conversation more natural. Also, I must ensure the information is accurate and does not omit key points, such as my developer, main functions, and use cases.
I also need to consider potential follow-up questions from the user, such as specific application examples or technical details. So, I can subtly plant seeds in my answer to encourage further questions. For example, mentioning "Whether it's a question about daily life or a professional topic, I can do my best to help" is both comprehensive and open-ended.
Finally, I will check if the response is fluent and free of repetitive or redundant information, ensuring it is concise and clear. I will also maintain a balance between being friendly and professional, so the user finds me both approachable and reliable.
====================Complete Response====================
Hello there!~ I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, create text, perform logical reasoning, write code, and more. My purpose is to provide help and support to users. Whether you have questions about daily life or professional topics, I will do my best to assist. How can I help you?Java
Sample code
// DashScope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Response
====================Thinking Process====================
Okay, the user is asking "Who are you?", and I need to answer based on my predefined settings. First, my role is Qwen, a large-scale language model from Alibaba Group. I should keep the tone conversational, simple, and easy to understand.
The user might be new to me or wants to confirm my identity. I should first state who I am directly, then briefly explain my functions and uses, such as answering questions, creating text, and coding. I should also mention my multilingual support so the user knows I can handle requests in different languages.
Also, according to the guidelines, I should maintain a human-like persona, so the tone should be friendly. I might use emojis to add a touch of warmth. At the same time, I might need to guide the user to ask more questions or use my features, for example, by asking what they need help with.
I need to be careful not to use complex terminology and avoid being verbose. I will check for any missed key points, such as multilingual support and specific capabilities. I must ensure the response meets all requirements, including being conversational and concise.
====================Complete Response====================
Hello! I am Qwen, a large-scale language model from Alibaba Group. I can answer questions and create text, such as stories, official documents, emails, and playbooks. I can also perform logical reasoning, write code, express opinions, and play games. I am proficient in multiple languages, including but not limited to Chinese, English, German, French, and Spanish. Is there anything I can help you with?HTTP
Sample code
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Delete this comment before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"the user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" '","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......
id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" welcome","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" anytime","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" to tell me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}Core capabilities
Switch between thinking and non-thinking modes
Thinking generally improves response quality but increases response latency and cost. When using a hybrid thinking model, dynamically switch between thinking and non-thinking modes based on question complexity without changing the model:
For simple tasks that do not require complex reasoning, such as daily chats or simple Q&A, set
enable_thinkingtofalse.For complex tasks that require reasoning, such as logical inference, code generation, or solving math problems, set
enable_thinkingtotrue.
OpenAI compatible
enable_thinking is not a standard OpenAI parameter. If you use the OpenAI Python SDK, pass this parameter through extra_body. In the Node.js SDK, pass it as a top-level parameter.
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If the environment variable is not configured, replace it with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages,
# Set enable_thinking through extra_body to enable the thinking process
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full response
is_answering = False # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the thinking content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# After receiving content, start generating the response
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Response
====================Thinking Process====================
Okay, the user is asking 'Who are you'. I need to figure out what they want to know. They might be interacting with me for the first time or want to confirm my identity. I should start by introducing myself as Qwen, developed by Tongyi Lab. Then, I should explain my capabilities, such as answering questions, creating text, and programming, so the user understands how I can help. I should also mention that I support multiple languages, so international users know they can communicate in different languages. Finally, I should be friendly and invite them to ask more questions to encourage further interaction. I need to be concise and clear, avoiding too much technical jargon to make it easy for the user to understand. The user probably wants a quick overview of my abilities, so I'll focus on my functions and uses. I should also check if I've missed any information, like mentioning Alibaba Group or more technical details. However, the user probably just needs basic information, not an in-depth explanation. I'll make sure my response is friendly and professional, and encourages the user to keep asking questions.
====================Full Response====================
I am Qwen, a large-scale language model developed by Tongyi Lab. I can help you answer questions, create text, write code, and express ideas. I support conversations in multiple languages. How can I help you?
====================Token Usage====================
CompletionUsage(completion_tokens=221, prompt_tokens=10, total_tokens=231, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=172, rejected_prediction_tokens=None), prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0))Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
// If the environment variable is not configured, replace it with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // Full thinking process
let answerContent = ''; // Full response
let isAnswering = false; // Indicates whether the response phase has started
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
// In the Node.js SDK, non-standard parameters like enable_thinking are passed as top-level properties, not within extra_body.
enable_thinking: true,
stream: true,
stream_options: {
include_usage: true
},
});
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\n' + '='.repeat(20) + 'Token Usage' + '='.repeat(20) + '\n');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the thinking content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// After receiving content, start generating the response
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Response
====================Thinking Process====================
Okay, the user is asking 'Who are you'. I need to figure out what they want to know. They might be interacting with me for the first time or want to confirm my identity. I should start by introducing my name and identity, such as Qwen. Then I should state that I am a large-scale language model independently developed by Tongyi Lab under Alibaba Group. Next, I should mention my capabilities, such as answering questions, creating text, programming, and expressing opinions, so the user understands my purpose. I should also mention that I support multiple languages, which international users will find useful. Finally, I should invite them to ask questions and maintain a friendly and open attitude. I need to use simple and easy-to-understand language, avoiding too much technical jargon. The user might need help or just be curious, so the response should be cordial and encourage further interaction. Additionally, I might need to consider if the user has deeper needs, such as testing my abilities or seeking specific help, but the initial response should focus on basic information and guidance. I will keep the tone conversational and avoid complex sentences to make the information more effective.
====================Full Response====================
Hello! I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you answer questions, create text (such as stories, official documents, emails, and playbooks), perform logical reasoning, write code, and even express opinions and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish.
If you have any questions or need help, feel free to ask me anytime!
====================Token Usage====================
{
prompt_tokens: 10,
completion_tokens: 288,
total_tokens: 298,
completion_tokens_details: { reasoning_tokens: 188 },
prompt_tokens_details: { cached_tokens: 0 }
}HTTP
Sample code
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'DashScope
Python
Sample code
import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not configured, replace it with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message", # Set the result format to message
enable_thinking=True, # Enable the thinking process
stream=True, # Enable streaming output
incremental_output=True, # Enable incremental output
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full response
is_answering = False # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Collect only the thinking content
if message.reasoning_content:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# After receiving content, start generating the response
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
print("\n" + "=" * 20 + "Token Usage" + "=" * 20 + "\n")
print(chunk.usage)
# After the loop finishes, the reasoning_content and answer_content variables contain the complete content
# You can perform subsequent processing here as needed
# print(f"\n\nFull thinking process:\n{reasoning_content}")
# print(f"\nFull response:\n{answer_content}")
Response
====================Thinking Process====================
Okay, the user is asking 'Who are you?'. I need to figure out what they want to know. They might be interacting with me for the first time or want to confirm my identity. First, I should introduce myself as Qwen and state that I am a large-scale language model developed by Tongyi Lab. Next, I might need to explain my capabilities, such as answering questions, creating text, and programming, so the user understands my purpose. I should also mention that I support multiple languages, so international users know they can communicate in different languages. Finally, I should be friendly and invite them to ask questions to encourage further interaction. I need to use simple and easy-to-understand language, avoiding too much technical jargon. The user might have deeper needs, such as testing my abilities or seeking help, so providing specific examples like writing stories, official documents, or emails would be better. I should also ensure the response is well-structured, perhaps by listing my functions, but a natural transition might be better than using bullets. Additionally, I should emphasize that I am an AI assistant without personal consciousness and all my answers are based on training data to avoid misunderstandings. I might need to check if I've missed any important information, such as multimodal capabilities or recent updates, but based on previous responses, I probably don't need to go too deep. In short, the response should be comprehensive yet concise, friendly, and helpful, making the user feel understood and supported.
====================Full Response====================
I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you with the following:
1. **Answer questions**: I can try to answer your academic, general knowledge, or domain-specific questions.
2. **Create text**: I can help you write stories, official documents, emails, playbooks, and more.
3. **Logical reasoning**: I can help you with logical reasoning and problem-solving.
4. **Programming**: I can understand and generate code in various programming languages.
5. **Multilingual support**: I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish.
If you have any questions or need help, feel free to ask me anytime!
====================Token Usage====================
{"input_tokens": 11, "output_tokens": 405, "total_tokens": 416, "output_tokens_details": {"reasoning_tokens": 256}, "prompt_tokens_details": {"cached_tokens": 0}}Java
Sample code
// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Response
====================Thinking Process====================
Okay, the user is asking 'Who are you?'. I need to figure out what they want to know. They might want to know my identity or are testing my response. First, I should clearly state that I am Qwen, a large-scale language model from Alibaba Group. Then, I might need to briefly introduce my capabilities, such as answering questions, creating text, and programming, so the user understands my purpose. I should also mention that I support multiple languages, so international users know they can communicate in different languages. Finally, I should be friendly and invite them to ask questions, which will make them feel welcome and willing to continue the interaction. I need to make sure the answer is not too long but is comprehensive. The user might have follow-up questions, such as my technical details or use cases, but the initial response should be concise and clear. I will ensure I don't use technical jargon so that all users can understand. I will check if I have missed any important information, such as multilingual support and specific examples of my functions. Okay, this should cover the user's needs.
====================Full Response====================
I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, create text (such as stories, official documents, emails, and playbooks), perform logical reasoning, write code, express opinions, play games, and more. I support conversations in multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me anytime!HTTP
Sample code
curl
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?/no_think"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Okay","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":", ","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"the user ","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"is asking","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"'","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......
id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":", ","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"feel free ","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"to ask ","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"me anytime","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}Additionally, the hybrid thinking models of open source Qwen3, qwen-plus-2025-04-28, and qwen-turbo-2025-04-28 provide a method to dynamically control the thinking mode using prompts. When enable_thinking is true, add /no_think to the prompt to disable the thinking mode. To re-enable the thinking mode in a multi-turn conversation, add /think to the latest input prompt. The model follows the most recent /think or /no_think instruction.
Limit the thinking length
Sometimes, models in deep thinking mode generate long reasoning processes, which increases wait times and consumes more tokens. Use the thinking_budget parameter to limit the maximum number of tokens for the inference process. If the limit is exceeded, the model immediately generates a response.
thinking_budget is the model's maximum chain-of-thought length, see Models.Qwen3 (thinking mode) and Kimi supports the thinking_budget parameter.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If the environment variable is not configured, replace api_key="sk-xxx" with your Model Studio API key.
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you?"}]
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages,
# The enable_thinking parameter enables the thinking process, and the thinking_budget parameter sets the maximum number of tokens for the inference process.
extra_body={
"enable_thinking": True,
"thinking_budget": 50
},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full response
is_answering = False # Indicates whether the model is in the response stage
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the thinking content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# When content is received, start generating the response
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full Response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.contentResponse
====================Thinking Process====================
Okay, the user is asking "Who are you?" I need to give a clear and friendly answer. First, I should state my identity, which is Qwen, developed by the Qwen team at Alibaba Group. Next, I should explain my main functions, such as answering
====================Full Response====================
I am Qwen, a large-scale language model developed by the Qwen team at Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code. My purpose is to provide help and convenience to users. How can I help you?Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variables
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you?' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
stream: true,
// The enable_thinking parameter enables the thinking process, and the thinking_budget parameter sets the maximum number of tokens for the inference process.
enable_thinking: true,
thinking_budget: 50
});
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the thinking content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// When content is received, start generating the response
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Response
====================Thinking Process====================
Okay, the user is asking "Who are you?" I need to give a clear and accurate answer. First, I should introduce myself as Qwen, developed by the Qwen team at Alibaba Group. Next, I should explain my main functions, such as answering questions
====================Full Response====================
I am Qwen, a large-scale language model independently developed by the Qwen team at Alibaba Group. I can perform various tasks such as answering questions, creating text, performing logical reasoning, and writing code. If you have any questions or need help, feel free to ask me anytime!HTTP
Sample code
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true,
"thinking_budget": 50
}'Response
data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
.....
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: [DONE]DashScope
Python
Sample code
import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message",
enable_thinking=True,
# Set the maximum number of tokens for the inference process.
thinking_budget=50,
stream=True,
incremental_output=True,
)
# Define the full thinking process.
reasoning_content = ""
# Define the full response.
answer_content = ""
# Determine whether to end the thinking process and start the response.
is_answering = False
print("=" * 20 + "Thinking Process" + "=" * 20)
for chunk in completion:
# If both the thinking process and the response are empty, ignore.
if (
chunk.output.choices[0].message.content == ""
and chunk.output.choices[0].message.reasoning_content == ""
):
pass
else:
# If it is currently in the thinking process.
if (
chunk.output.choices[0].message.reasoning_content != ""
and chunk.output.choices[0].message.content == ""
):
print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If it is currently in the response stage.
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Full Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="", flush=True)
answer_content += chunk.output.choices[0].message.content
# To print the full thinking process and the full response, uncomment and run the following code.
# print("=" * 20 + "Full Thinking Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Response
====================Thinking Process====================
Okay, the user is asking "Who are you?" I need to give a clear and friendly answer. First, I should introduce myself as Qwen, developed by the Qwen team at Alibaba Group. Next, I should explain my main functions, such as
====================Full Response====================
I am Qwen, a large-scale language model independently developed by the Qwen team at Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code. My purpose is to provide users with comprehensive, accurate, and useful information and assistance. How can I help you?Java
Sample code
// DashScope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.thinkingBudget(50)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Response
====================Thinking Process====================
Okay, the user is asking "Who are you?" I need to give a clear and friendly answer. First, I should introduce myself as Qwen, developed by the Qwen team at Alibaba Group. Next, I should explain my main functions, such as
====================Full Response====================
I am Qwen, a large-scale language model independently developed by the Qwen team at Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code. My purpose is to provide users with comprehensive, accurate, and useful information and assistance. How can I help you?HTTP
Sample code
curl
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"thinking_budget": 50,
"incremental_output": true,
"result_format": "message"
}
}'Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Okay","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"output_tokens":3,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":1}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"output_tokens":4,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":2}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
......
id:133
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
id:134
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}Other features
Billing details
Thinking process is billed as output tokens.
Some hybrid thinking models are priced differently for thinking and non-thinking modes.
If a model in thinking mode does not output a thought process, it is billed at the non-thinking price.
FAQ
Q: How to disable the thinking mode?
Q: Which models support non-streaming output?
Q: How to purchase tokens after my free quota is used up?
Q: Can I upload images or documents to ask questions?
Q: How to view token usage and call counts?
API reference
For the input and output parameters, see Qwen.
Error codes
If an error occurs during execution, see Error Messages for troubleshooting.
