Deep thinking models reason before responding to improve accuracy for complex tasks such as logical reasoning and numerical calculations. This topic describes how to call deep thinking models such as Qwen and DeepSeek.
Usage
Model Studio provides APIs for various deep thinking models. These APIs support two modes: hybrid thinking and thinking-only.
-
Hybrid thinking mode: Use the
enable_thinkingparameter to control whether to enable thinking mode:-
true: The model responds after thinking. -
false: The model responds directly.
OpenAI compatible
# Import dependencies and create a client... completion = client.chat.completions.create( model="qwen-plus", # Select a model messages=[{"role": "user", "content": "Who are you"}], # Because enable_thinking is not a standard OpenAI parameter, pass it through extra_body. extra_body={"enable_thinking":True}, # Call in streaming output mode. stream=True, # Make the last packet of the streaming response include token consumption information. stream_options={ "include_usage": True } )DashScope
The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a
url error. For the correct invocation method, see Enable or disable thinking mode.# Import dependencies... response = Generation.call( # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx", api_key=os.getenv("DASHSCOPE_API_KEY"), # You can replace this with other deep thinking models as needed. model="qwen-plus", messages=messages, result_format="message", enable_thinking=True, stream=True, incremental_output=True ) -
-
Thinking-only mode: The model always thinks before responding, and this feature cannot be disabled. The request format is the same as hybrid thinking mode, except you do not need to set the enable_thinking parameter.
The reasoning content is returned in the reasoning_content field, and the response content is returned in the content field. Because deep thinking models must reason before responding, response latency increases. Most of these models support only streaming output. Therefore, this topic uses streaming calls as examples.
Supported models
Qwen3.5
-
Commercial edition
-
Qwen3.5 Plus series (hybrid thinking mode, enabled by default): qwen3.5-plus, qwen3.5-plus-2026-02-15
-
Qwen3.5 Flash series (hybrid thinking mode, enabled by default): qwen3.5-flash, qwen3.5-flash-2026-02-23
-
-
Open source edition
-
Hybrid thinking mode, enabled by default: qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b
-
Qwen3
-
Commercial edition
-
Qwen Max series (hybrid thinking mode, disabled by default): qwen3-max-2026-01-23, qwen3-max-preview
-
Qwen Plus series (hybrid thinking mode, disabled by default): qwen-plus, qwen-plus-latest, qwen-plus-2025-04-28 and later snapshots
-
Qwen Flash series (hybrid thinking mode, disabled by default): qwen-flash, qwen-flash-2025-07-28 and later snapshots
-
Qwen Turbo series (hybrid thinking mode, disabled by default): qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28 and later snapshots
-
-
Open source edition
-
Hybrid thinking mode, enabled by default: qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b
-
Thinking-only mode: qwen3-next-80b-a3b-thinking, qwen3-235b-a22b-thinking-2507, qwen3-30b-a3b-thinking-2507
-
QwQ (based on Qwen2.5)
Thinking-only mode: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05, qwq-32b
DeepSeek (Beijing)
-
Hybrid thinking mode, disabled by default: deepseek-v3.2, deepseek-v3.2-exp, deepseek-v3.1
-
Thinking-only mode: deepseek-r1, deepseek-r1-0528, deepseek-r1 distilled model
GLM (Beijing)
Hybrid thinking mode, enabled by default: glm-5, glm-4.7, glm-4.6
Kimi (Beijing)
Thinking-only mode: kimi-k2-thinking
For information such as model names, context window, pricing, and snapshot versions, see Model list. For information about rate limits, see Rate limiting.
Getting started
Prerequisites: You have obtained an API key and configured it as an environment variable. If you make calls using an SDK, install the OpenAI or DashScope SDK. The DashScope Java SDK version must be 2.19.4 or later.
Run the following code to call qwen-plus in thinking mode with streaming output.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region. If you use a model in the Virginia region, replace the base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1
# If you use a model in the Beijing region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="qwen-plus", # You can replace this with other deep thinking models as needed.
messages=messages,
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full response
is_answering = False # Whether the response phase has started
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the reasoning content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# When content is received, start responding
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Response
====================Thinking process====================
Okay, the user is asking "Who are you". I need to provide an accurate and friendly answer. First, I must confirm my identity, which is Qwen, developed by the Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions, creating text, and logical reasoning. I should also maintain a friendly tone and avoid being too technical to make the user feel at ease. I must also be careful not to use complex terminology and ensure the answer is concise and clear. Additionally, I might need to add some interactive elements, inviting the user to ask questions to encourage further communication. Finally, I will check if I have missed any important information, such as my Chinese name "Qwen" and English name "Qwen", along with my parent company and lab. I need to ensure the answer is comprehensive and meets the user's expectations.
====================Full response====================
Hello! I am Qwen, an ultra-large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I can answer questions, create text, perform logical reasoning, write code, and more, with the goal of providing users with high-quality information and services. You can call me Qwen. How can I help you?
Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variables
// The following is the base_url for the Singapore region. If you use a model in the Virginia region, replace the base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1
// If you use a model in the Beijing region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
stream: true,
enable_thinking: true
});
console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the reasoning content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// When content is received, start responding
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Response
====================Thinking process====================
Okay, the user is asking "Who are you". I need to respond with my identity. First, I should clearly state that I am Qwen, an ultra-large-scale language model developed by Alibaba Cloud. Next, I can mention my main functions, such as answering questions, creating text, and logical reasoning. I should also emphasize my multilingual support, including Chinese and English, so the user knows I can handle requests in different languages. Additionally, I might need to explain my application scenarios, such as helping with study, work, and daily life. However, the user's question is quite direct, so I probably don't need to provide too much detail. I should keep it concise and clear. At the same time, I must ensure a friendly tone and invite the user to ask further questions. I will check for any missed important information, such as my version or latest updates, but the user probably doesn't need that level of detail. Finally, I will confirm that the answer is accurate and free of errors.
====================Full response====================
I am Qwen, an ultra-large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I am capable of various tasks, including answering questions, creating text, logical reasoning, and coding. I support multiple languages, including Chinese and English. If you have any questions or need help, feel free to let me know!
HTTP
Sample code
curl
# ======= Important =======
# The following is the base_url for Singapore. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# If you use a model in the Virginia region, replace the base_url with: https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'
Response
data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
.....
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: [DONE]
DashScope
The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a url error. For the correct invocation method, see Enable or disable thinking mode.
Python
Sample code
import os
from dashscope import Generation
import dashscope
# Base URL for the Singapore region. For the US (Virginia) region, use https://dashscope-us.aliyuncs.com/api/v1.
# For the China (Beijing) region, use https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# Replace the following line with api_key = "sk-xxx" if you do not configure the environment variable.
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message",
enable_thinking=True,
stream=True,
incremental_output=True,
)
# Store the full reasoning content.
reasoning_content = ""
# Store the full response.
answer_content = ""
# Track whether reasoning has ended and the response has started.
is_answering = False
print("=" * 20 + "Reasoning process" + "=" * 20)
for chunk in completion:
# Skip empty chunks.
if (
chunk.output.choices[0].message.content == ""
and chunk.output.choices[0].message.reasoning_content == ""
):
pass
else:
# Print reasoning content.
if (
chunk.output.choices[0].message.reasoning_content != ""
and chunk.output.choices[0].message.content == ""
):
print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# Print response content.
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="", flush=True)
answer_content += chunk.output.choices[0].message.content
# Uncomment the following lines to print the full reasoning content and full response.
# print("=" * 20 + "Full reasoning content" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Response
====================Reasoning process====================
The user asks: “Who are you?” I need to answer this question. First, I identify myself as Qwen, a large-scale language model developed by Alibaba Cloud. Next, I describe my capabilities, such as answering questions, generating text, logical reasoning, and programming. My goal is to support users as a helpful assistant.
I keep my tone conversational and avoid technical terms or complex sentences. I add friendly phrases like “Hello!” to make the interaction natural. I ensure accuracy and cover key points, including my developer, main functions, and use cases.
I also anticipate follow-up questions, such as examples or technical details. So I hint at broader support—for example, by saying “I can help with everyday questions or professional topics.” This keeps the response open and inviting.
Finally, I check for flow, repetition, or redundancy. I keep it concise, friendly, and professional.
====================Full response====================
Hello! I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, generate text—such as stories, official documents, emails, scripts—and perform logical reasoning and programming. I aim to support and assist you. Whether your question is about daily life or a professional topic, I will do my best to help. Is there anything I can assist you with?
Java
Sample code
// dashscope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// Base URL for the Singapore region. For the US (Virginia) region, use https://dashscope-us.aliyuncs.com/api/v1.
// For the China (Beijing) region, use https://dashscope.aliyuncs.com/api/v1.
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// Replace the following line with .apiKey("sk-xxx") if you do not configure the environment variable.
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}
Response
====================Reasoning process====================
The user asks, “Who are you?” I must answer based on my identity. I am Qwen, a large-scale language model from Alibaba Group. I keep my reply conversational and simple.
The user may be new to me or confirming my identity. I start by stating who I am, then briefly list my abilities—answering questions, writing stories, drafting documents, coding, and more. I mention multilingual support so users know I handle multiple languages.
To sound human, I use a friendly tone and maybe an emoji. I also invite further questions or tasks, like asking how I can help.
I avoid jargon and long sentences. I double-check for missing points, like multilingual support and core skills. I ensure the reply is clear, friendly, and professional.
====================Full response====================
Hello! I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, write stories, draft official documents, compose emails, create scripts, perform logical reasoning, code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. How can I help you?
HTTP
Sample code
curl
# ======= Important notice =======
# The following URL is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# For the US (Virginia) region, use: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Remove this comment before running ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'
Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Okay","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"\"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......
id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Feel free","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"to","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"let me know","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
Core capabilities
Switch between thinking and non-thinking modes
Enabling thinking mode usually improves response quality but increases latency and cost. When using models that support hybrid thinking mode, dynamically switch between thinking and non-thinking modes based on question complexity without changing models:
-
For tasks that do not require complex reasoning (such as casual chat or simple Q&A), set
enable_thinkingtofalseto disable thinking mode. -
For tasks that require complex reasoning (such as logical reasoning, code generation, or math problem solving), set
enable_thinkingtotrueto enable thinking mode.
OpenAI compatibility
enable_thinking is not an OpenAI standard parameter. If you use the OpenAI Python SDK, pass it through extra_body. In the Node.js SDK, pass it as a top-level parameter.
Python
Example code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If you haven't configured an environment variable, replace this with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# This is the base_url for the Singapore region. If you use a model in the Virginia region, replace base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1. If you use a model in the Beijing region, replace base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you?"}]
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages,
# Use extra_body to set enable_thinking and enable the reasoning process
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Full reasoning process
answer_content = "" # Full response
is_answering = False # Whether the response phase has started
print("\n" + "=" * 20 + "Reasoning process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only reasoning content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# Received content; start responding
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Response
====================Reasoning process====================
Hmm, the user asked "Who are you?" I need to figure out what they really want to know. They might be encountering me for the first time or verifying my identity. I should start by introducing myself as Qwen, developed by Tongyi Lab. Then explain my capabilities—answering questions, generating text, coding, etc.—so users understand how I can help. Mentioning multilingual support shows international users they can interact in their preferred language. End with a friendly invitation to ask more questions to encourage further interaction. Keep it concise and avoid excessive technical jargon so it's easy to understand. The user likely wants a quick overview of my abilities, so focus on features and use cases. Also check if any key details are missing, like mentioning Alibaba Group or deeper technical specs. But basic info is probably sufficient here. Ensure the tone stays friendly and professional while inviting follow-up questions.
====================Full response====================
I am Qwen, a large-scale language model developed by Tongyi Lab. I can help you answer questions, create text, write code, express opinions, and more—all in multiple languages. Is there anything I can assist you with?
====================Token usage====================
CompletionUsage(completion_tokens=221, prompt_tokens=10, total_tokens=231, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=172, rejected_prediction_tokens=None), prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0))
Node.js
Example code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
// If you haven't configured an environment variable, replace this with your Alibaba Cloud Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// This is the base_url for the Singapore region. If you use a model in the Virginia region, replace baseURL with https://dashscope-us.aliyuncs.com/compatible-mode/v1. If you use a model in the Beijing region, replace baseURL with https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // Full reasoning process
let answerContent = ''; // Full response
let isAnswering = false; // Whether the response phase has started
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you?' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
// In the Node.js SDK, non-standard parameters like enable_thinking are passed as top-level properties, not inside extra_body
enable_thinking: true,
stream: true,
stream_options: {
include_usage: true
},
});
console.log('\n' + '='.repeat(20) + 'Reasoning process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\n' + '='.repeat(20) + 'Token usage' + '='.repeat(20) + '\n');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only reasoning content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// Received content; start responding
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Response
====================Reasoning process====================
Hmm, the user asked "Who are you?" I need to determine what they're looking for. They might be new to me or confirming my identity. Start by introducing my name—Qwen—and mention I'm a large-scale language model independently developed by Tongyi Lab under Alibaba Group. Next, highlight my capabilities: answering questions, creating text (like stories, official documents, emails, scripts), logical reasoning, coding, expressing opinions, and even playing games. Emphasize multilingual support—including Chinese, English, German, French, Spanish, and more—so international users feel included. End with an open, friendly invitation to ask questions. Keep language simple and conversational, avoiding complex sentences or jargon. The user might be testing my abilities or seeking specific help, but for a first reply, stick to core info and guidance. Stay approachable to encourage further interaction.
====================Full response====================
Hello! I'm Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you answer questions, create text (like stories, official documents, emails, scripts), perform logical reasoning, write code, express opinions, and even play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish.
If you have any questions or need assistance, just let me know!
====================Token usage====================
{
prompt_tokens: 10,
completion_tokens: 288,
total_tokens: 298,
completion_tokens_details: { reasoning_tokens: 188 },
prompt_tokens_details: { cached_tokens: 0 }
}
HTTP
Example code
curl
# ======= Important notes =======
# This is the base_url for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# If you use a model in the Virginia region, replace the URL with: https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'
DashScope
The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a url error. For the correct invocation method, see Enable or disable thinking mode.
Python
Example code
import os
from dashscope import Generation
import dashscope
# This is the base_url for the Singapore region. If you use a model in the Virginia region, replace base_url with https://dashscope-us.aliyuncs.com/api/v1
# If you use a model in the Beijing region, replace base_url with https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If you haven't configured an environment variable, replace this with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message", # Set result format to message
enable_thinking=True, # Enable reasoning process
stream=True, # Enable streaming output
incremental_output=True, # Enable incremental output
)
reasoning_content = "" # Full reasoning process
answer_content = "" # Full response
is_answering = False # Whether the response phase has started
print("\n" + "=" * 20 + "Reasoning process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Collect only reasoning content
if message.reasoning_content:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# Received content; start responding
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)
# After the loop, reasoning_content and answer_content contain the complete content
# You can add further processing here as needed
# print(f"\n\nFull reasoning process:\n{reasoning_content}")
# print(f"\nFull response:\n{answer_content}")
Response
====================Reasoning process====================
Hmm, the user asked "Who are you?" I need to determine what they're looking for. They might be encountering me for the first time or verifying my identity. First, introduce my name—Qwen—and state I'm a large-scale language model developed by Tongyi Lab. Next, explain my capabilities: answering questions, creating text, coding, etc., so users understand my utility. Mention multilingual support to show international users they can interact in their preferred language. End with a friendly invitation to ask questions to encourage further interaction. Use clear, simple language and avoid excessive technical terms. The user might have deeper needs—like testing my abilities or seeking specific help—so providing concrete examples (writing stories, official documents, emails, etc.) helps. Ensure the response flows naturally without bullet points. Also clarify I'm an AI assistant without personal consciousness, basing all answers on training data to prevent misunderstandings. Check for missing key info like multimodal capabilities or recent updates, but keep it concise. Overall, aim for a helpful, friendly, and supportive reply that makes users feel understood.
====================Full response====================
I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you:
1. **Answer questions**: Whether academic, general knowledge, or domain-specific, I'll do my best to assist.
2. **Create text**: Write stories, official documents, emails, scripts—I can handle them all.
3. **Logical reasoning**: I can help solve problems through logical analysis.
4. **Programming**: I understand and generate code in multiple programming languages.
5. **Multilingual support**: I support many languages, including but not limited to Chinese, English, German, French, and Spanish.
If you have any questions or need help, just let me know!
====================Token usage====================
{"input_tokens": 11, "output_tokens": 405, "total_tokens": 416, "output_tokens_details": {"reasoning_tokens": 256}, "prompt_tokens_details": {"cached_tokens": 0}}
Java
Example code
// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If you haven't configured an environment variable, replace the next line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
// This is the base_url for the Singapore region. If you use a model in the Virginia region, replace base_url with https://dashscope-us.aliyuncs.com/api/v1
// If you use a model in the Beijing region, replace base_url with https://dashscope.aliyuncs.com/api/v1
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print final results
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}
Response
====================Reasoning process====================
Hmm, the user asked "Who are you?" I need to figure out what they want to know. They might be curious about my identity or testing my response. Start by clearly stating I'm Qwen, a large-scale language model under Alibaba Group. Briefly outline my capabilities—answering questions, creating text, coding—to show my utility. Mention multilingual support so international users know they can interact in their preferred language. End with a friendly invitation to ask questions to make them feel welcome. Keep the response concise but informative. The user might have follow-up questions about technical details or use cases, but the initial reply should stay simple and clear. Avoid jargon so all users can understand. Double-check for key omissions like multilingual support and specific feature examples. This should cover their needs.
====================Full response====================
I am Qwen, a large-scale language model under Alibaba Group. I can answer questions, create text (such as stories, official documents, emails, scripts), perform logical reasoning, write code, express opinions, play games, and support multilingual communication—including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need assistance, feel free to ask!
HTTP
Example code
curl
# ======= Important notes =======
# This is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# If you use a model in the Virginia region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Delete this comment before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'
Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"\"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......
id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Welcome","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"at any time","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"tell me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
Additionally, the open-source Qwen3 hybrid-thinking models and qwen-plus-2025-04-28 and qwen-turbo-2025-04-28 support dynamically controlling thinking mode through prompts. When enable_thinking is set to true, adding /no_think to the prompt disables thinking mode. To re-enable thinking mode in a multi-turn conversation, add /think to the latest input prompt. The model follows the most recent /think or /no_think instruction.
Limit thinking process length
Deep thinking models can sometimes generate long inference processes. This increases wait times and consumes more tokens. Use the thinking_budget parameter to limit the maximum number of tokens for the inference process. If the limit is exceeded, the model immediately generates a response.
thinking_budget is the model’s maximum chain-of-thought length. For more information, see Model list.
The thinking_budget parameter is supported by Qwen3 (thinking mode) and Kimi.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client.
client = OpenAI(
# If the environment variable is not configured, replace the value with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/compatible-mode/v1. # If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/compatible-mode/v1.
# If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/compatible-mode/v1.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages,
# The enable_thinking parameter enables the thinking process. The thinking_budget parameter sets the maximum number of tokens for the inference process.
extra_body={
"enable_thinking": True,
"thinking_budget": 50
},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Indicates whether the response phase has started
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the thinking content.
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# After receiving the content, start generating the response.
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Response
====================Thinking Process====================
Okay, the user asked "Who are you". I need to give a clear and friendly answer. First, I should clarify my identity as Qwen, developed by Tongyi Lab of Alibaba Group. Then, I need to explain my main functions, such as answering
====================Complete Response====================
I am Qwen, a large-scale language model developed by Tongyi Lab of Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code to help and assist users. Is there anything I can help you with?
Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client.
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
// The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/compatible-mode/v1.
// If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/compatible-mode/v1.
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
stream: true,
// The enable_thinking parameter enables the thinking process. The thinking_budget parameter sets the maximum number of tokens for the inference process.
enable_thinking: true,
thinking_budget: 50
});
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the thinking content.
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// After receiving the content, start generating the response.
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Response
====================Thinking Process====================
Okay, the user asked "Who are you". I need to provide a clear and accurate answer. First, I should introduce myself as Qwen, developed by Tongyi Lab of Alibaba Group. Next, I should explain my main functions, such as answering questions
====================Complete Response====================
I am Qwen, a large-scale language model independently developed by Tongyi Lab of Alibaba Group. I can perform various tasks such as answering questions, creating text, performing logical reasoning, and writing code. If you have any questions or need help, feel free to ask me at any time!
HTTP
Sample code
curl
# ======= Important =======
# The following is the base_url for Singapore. If you use a model in the Beijing region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true,
"thinking_budget": 50
}'
Response
data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
.....
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: [DONE]
DashScope
The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a url error. For the correct invocation method, see Enable or disable thinking mode.
Python
Sample code
import os
from dashscope import Generation
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message",
enable_thinking=True,
# Set the maximum number of tokens for the inference process.
thinking_budget=50,
stream=True,
incremental_output=True,
)
# Define the complete thinking process.
reasoning_content = ""
# Define the complete response.
answer_content = ""
# Determine whether to end the thinking process and start the response.
is_answering = False
print("=" * 20 + "Thinking Process" + "=" * 20)
for chunk in completion:
# If both the thinking process and the response are empty, ignore the chunk.
if (
chunk.output.choices[0].message.content == ""
and chunk.output.choices[0].message.reasoning_content == ""
):
pass
else:
# If it is currently in the thinking process.
if (
chunk.output.choices[0].message.reasoning_content != ""
and chunk.output.choices[0].message.content == ""
):
print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If it is currently in the response phase.
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="", flush=True)
answer_content += chunk.output.choices[0].message.content
# To print the complete thinking process and the complete response, uncomment and run the following code.
# print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Response
====================Thinking Process====================
Okay, the user asked "Who are you?". I need to give a clear and friendly answer. First, I should introduce myself as Qwen, developed by Tongyi Lab of Alibaba Group. Next, I should explain my main functions, such as
====================Complete Response====================
I am Qwen, a large-scale language model independently developed by Tongyi Lab of Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code to provide users with comprehensive, accurate, and useful information and help. Is there anything I can help you with?
Java
Sample code
// DashScope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
// If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.thinkingBudget(50)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}
Response
====================Thinking Process====================
Okay, the user asked "Who are you?". I need to give a clear and friendly answer. First, I should introduce myself as Qwen, developed by Tongyi Lab of Alibaba Group. Next, I should explain my main functions, such as
====================Complete Response====================
I am Qwen, a large-scale language model independently developed by Tongyi Lab of Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code to provide users with comprehensive, accurate, and useful information and help. Is there anything I can help you with?
HTTP
Sample code
curl
# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# If you use a model in the US (Virginia) region, replace the URL with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Delete this comment before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"thinking_budget": 50,
"incremental_output": true,
"result_format": "message"
}
}'
Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"OK","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"output_tokens":3,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":1}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"output_tokens":4,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":2}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
......
id:133
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
id:134
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
Other features
Billing
-
Charges for thinking content are based on output tokens.
-
Some hybrid thinking models have different prices for thinking mode and non-thinking mode.
If a model does not output the thinking process in thinking mode, it is billed at the non-thinking mode price.
FAQ
Q: How do I disable thinking mode?
Q: Which models support non-streaming output?
Q: How do I purchase tokens after my free quota runs out?
Q: Can I upload images or documents to ask questions?
Q: How do I view token usage and call counts?
API reference
See the input and output parameters for deep thinking models in Qwen.
Error codes
If execution fails, see Error messages for troubleshooting.
