Deep thinking models reason before generating a response to improve accuracy on complex tasks such as logical reasoning and numerical calculation.
Usage
Alibaba Cloud Model Studio provides APIs for deep thinking models in two modes: hybrid thinking mode and thinking-only mode.
Hybrid thinking mode: Use the
enable_thinkingparameter to enable or disable the thinking mode:When set to
true, the model thinks before responding.When set to
false, the model responds directly.
OpenAI compatible
# Import dependencies and create a client... completion = client.chat.completions.create( model="qwen-plus", # Select a model messages=[{"role": "user", "content": "Who are you"}], # Since enable_thinking is not a standard OpenAI parameter, pass it in extra_body. extra_body={"enable_thinking":True}, # Enable streaming output. stream=True, # Configure the stream to include token consumption information in the last data packet. stream_options={ "include_usage": True } )DashScope
The DashScope API for the Qwen3.5 series uses a multimodal interface. The following example results in a
url error. For the correct usage, see Enable or disable thinking mode.# Import dependencies... response = Generation.call( # If you have not set the environment variable, replace the next line with your API key, for example: api_key = "sk-xxx" api_key=os.getenv("DASHSCOPE_API_KEY"), # You can use other deep thinking models. model="qwen-plus", messages=messages, result_format="message", enable_thinking=True, stream=True, incremental_output=True )Thinking-only mode: The model always thinks before responding, and this behavior cannot be disabled. The request format is the same as hybrid thinking mode, but you do not need to set the enable_thinking parameter.
The API returns reasoning content in the reasoning_content field and the response in the content field. Deep thinking models reason before responding, which increases latency. Since most of these models support only streaming output, all examples use streaming calls.
Supported models
Qwen3.5
Commercial edition
Qwen3.5 Plus series (hybrid thinking mode, enabled by default): qwen3.5-plus, qwen3.5-plus-2026-02-15
Qwen3.5 Flash series (hybrid thinking mode, enabled by default): qwen3.5-flash, qwen3.5-flash-2026-02-23
Open source edition
Hybrid thinking mode (enabled by default): qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b
Qwen3
Commercial edition
Qwen Max series (hybrid thinking mode, disabled by default): qwen3-max-2026-01-23, qwen3-max-preview
Qwen Plus series (hybrid thinking mode, disabled by default): qwen-plus, qwen-plus-latest, qwen-plus-2025-04-28 and later snapshots
Qwen Flash series (hybrid thinking mode, disabled by default): qwen-flash, qwen-flash-2025-07-28 and later snapshots
Qwen Turbo series (hybrid thinking mode, disabled by default): qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28 and later snapshots
Open source edition
Hybrid thinking mode (enabled by default): qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b
Thinking-only mode: qwen3-next-80b-a3b-thinking, qwen3-235b-a22b-thinking-2507, qwen3-30b-a3b-thinking-2507
QwQ (Qwen2.5)
Thinking-only mode: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05, qwq-32b
DeepSeek (Beijing)
Hybrid thinking mode (disabled by default): deepseek-v3.2, deepseek-v3.2-exp, deepseek-v3.1
Thinking-only mode: deepseek-r1, deepseek-r1-0528, deepseek-r1 distilled model
GLM (Beijing)
Hybrid thinking mode (enabled by default): glm-5, glm-4.7, glm-4.6
Kimi (Beijing)
Thinking-only mode: kimi-k2-thinking
For model names, context windows, pricing, and snapshots, see the model list. For rate limiting, see rate limiting.
Quick start
Prerequisites: You have got an API key and configured it as an environment variable. If you use an SDK, install the OpenAI or DashScope SDK. The DashScope Java SDK version must be 2.19.4 or later.
Call the qwen-plus model in thinking mode with streaming output.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client.
client = OpenAI(
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If an environment variable is not configured, provide your Model Studio API key directly: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Configurations vary by region. Modify the base_url based on your region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="qwen-plus", # You can replace this with other deep-thinking models as needed.
messages=messages,
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Full thinking process
answer_content = "" # Full response
is_answering = False # Tracks if the response phase has started
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the reasoning content.
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# When content is received, start responding.
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Response
====================Thinking process====================
The user's query "Who are you?" requires an accurate and friendly response. The answer should first establish my identity as Qwen, developed by Tongyi Lab at Alibaba Cloud. It will then outline key capabilities such as question answering, text generation, and logical reasoning. The language must be simple and the tone approachable. To encourage interaction, I will invite the user to ask more questions. Finally, I'll check that all key details are present, including my names (Qwen, 千问) and developer, to provide a comprehensive answer.
====================Full response====================
Hello! I am Qwen, a large language model developed by Tongyi Lab at Alibaba Group. I can answer questions, generate text, perform logical reasoning, write code, and more, to provide you with high-quality information and services. You can call me Qwen. How can I help you?Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client.
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from an environment variable.
// The following is the base URL for the Singapore region. For models in the US (Virginia) region, change the base URL to https://dashscope-us.aliyuncs.com/compatible-mode/v1.
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
stream: true,
enable_thinking: true
});
console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the reasoning content.
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// When content is received, start responding.
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Response
====================Thinking process====================
The user's direct query "Who are you?" requires a concise and clear response. The answer will state my identity as Qwen, a large language model from Alibaba Cloud. It will mention key functions like question answering, text generation, and logical reasoning, and highlight multilingual support (Chinese, English). To remain concise, use cases will be mentioned briefly, if at all. The tone will be friendly, and the response will end with an invitation for further questions. Finally, I'll check to ensure accuracy without including unnecessary details like version numbers.
====================Full response====================
I am Qwen, a large language model developed by Tongyi Lab at Alibaba Group. I can perform a variety of tasks, including answering questions, generating text, logical reasoning, and coding, and I support multiple languages, including Chinese and English. If you have any questions or need help, feel free to ask me at any time!HTTP
Sample code
curl
# ======= Important =======
# The following URL is for the Singapore region. For the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# For the US (Virginia) region, replace it with: https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
# === Remove this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'Response
data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
.....
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: [DONE]DashScope
Because the DashScope API for the Qwen series uses a multimodal interface, the following text-generation example is incorrect and returns a url error. For the correct API call, see Enable or disable thinking mode.Python
Sample code
import os
from dashscope import Generation
import dashscope
# Configurations vary by region. Modify this value based on your region.
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If an environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message",
enable_thinking=True,
stream=True,
incremental_output=True,
)
# Full thinking process
reasoning_content = ""
# Full response
answer_content = ""
# Tracks if the response phase has started.
is_answering = False
print("=" * 20 + "Thinking process" + "=" * 20)
for chunk in completion:
# If both the thinking and response content are empty, ignore the chunk.
if (
chunk.output.choices[0].message.content == ""
and chunk.output.choices[0].message.reasoning_content == ""
):
pass
else:
# If the current chunk is part of the thinking process.
if (
chunk.output.choices[0].message.reasoning_content != ""
and chunk.output.choices[0].message.content == ""
):
print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If the current chunk is part of the response.
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="", flush=True)
answer_content += chunk.output.choices[0].message.content
# To print the full thinking process and full response, uncomment the following code.
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Response
====================Thinking process====================
To answer the query "Who are you?", the response must state my identity as Qwen, a large language model from Alibaba Cloud. It will then explain my purpose as a helpful assistant by outlining key functions like question answering, text generation, and logical reasoning. The response will maintain a conversational tone, avoiding jargon. To encourage further engagement, it will end with an open-ended question. Finally, I'll check for clarity, conciseness, and a balance between a friendly and professional tone.
====================Full response====================
Hello! I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, generate text, perform logical reasoning, write code, and more, to provide help and support. Whether you have a question about daily life or a professional topic, I will do my best to help. Is there anything I can help you with?Java
Sample code
// The DashScope SDK version must be 2.19.4 or later.
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following sets the base URL for the Singapore region. For models in the US (Virginia) region, change the URL to https://dashscope-us.aliyuncs.com/api/v1.
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If an environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Response
====================Thinking process====================
The response to "Who are you?" must be based on my predefined identity as Qwen, a large language model from Alibaba Cloud. The answer will be conversational, concise, and easy to understand. It will first state my identity, then explain my functions, including text creation, logical reasoning, coding, and multilingual support. The tone will be friendly, and the response will end with an invitation for the user to ask for help, to encourage further interaction.
====================Full response====================
Hello! I am Qwen, a large language model from Alibaba Group. I can answer questions; create text such as stories, official documents, emails, and scripts; perform logical reasoning; write code; express opinions; and even play games. I am proficient in multiple languages, including but not limited to Chinese, English, German, French, and Spanish. Is there anything I can help you with?HTTP
Sample code
curl
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. For the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# For the US (Virginia) region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Remove this comment before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" the","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......
id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" welcome","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" to","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" tell me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}Core capabilities
Toggle thinking and non-thinking modes
Thinking mode improves response quality but increases latency and cost. With hybrid thinking models, you can switch between thinking and non-thinking modes based on query complexity:
For queries that do not require complex reasoning (such as casual conversation or simple Q&A), set
enable_thinkingtofalseto disable thinking mode.For queries that require complex reasoning (such as logical reasoning, code generation, or mathematical solutions), set
enable_thinkingtotrueto enable thinking mode.
OpenAI compatibility
enable_thinking is not a standard OpenAI parameter. When using the OpenAI Python SDK, pass it via extra_body. In the Node.js SDK, this is passed as a top-level parameter.
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If the environment variable is not set, provide your API key here, e.g., api_key="sk-xxx"
# API keys are region-specific. Get yours at: https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The base_url varies by region. Update it for your deployment's region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you?"}]
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages,
# Set enable_thinking in extra_body to enable the reasoning process
extra_body={"enable_thinking": True},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # To store the full reasoning process
answer_content = "" # To store the full response
is_answering = False # Flag to indicate the start of the response phase
print("\n" + "=" * 20 + "Reasoning process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect reasoning content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# Once content is received, start collecting the response
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Response
====================Reasoning process====================
The user is asking "Who are you?", which is a request for my identity. My response should introduce me as Qwen from Tongyi Lab, summarize my key capabilities (answering questions, text generation, coding), and mention my multilingual support. The tone should be friendly and concise.
====================Full response====================
I am Qwen, a large-scale language model developed by Tongyi Lab. I can help you answer questions, generate text, write code, express opinions, and more, all while supporting multiple languages. What can I help you with today?
====================Token usage====================
CompletionUsage(completion_tokens=221, prompt_tokens=10, total_tokens=231, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=172, rejected_prediction_tokens=None), prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0))Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
// If the environment variable is not set, provide your API key here, e.g., apiKey: "sk-xxx"
// API keys are region-specific. Get yours at: https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The baseURL varies by region. Update it for your deployment's region.
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // To store the full reasoning process
let answerContent = ''; // To store the full response
let isAnswering = false; // Flag to indicate the start of the response phase
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you?' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
// Note: In the Node.js SDK, non-standard parameters like enable_thinking are passed as top-level properties and are not required in extra_body.
enable_thinking: true,
stream: true,
stream_options: {
include_usage: true
},
});
console.log('\n' + '='.repeat(20) + 'Reasoning process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\n' + '='.repeat(20) + 'Token usage' + '='.repeat(20) + '\n');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect reasoning content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// Once content is received, start collecting the response
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Response
====================Reasoning process====================
The user's query is "Who are you?". This is an identity question, likely from a new user. My plan is to: 1. Introduce myself as Qwen, a large-scale language model from Tongyi Lab at Alibaba Group. 2. List my main functions (Q&A, content creation, coding). 3. Mention my multilingual support. 4. Maintain a friendly, conversational tone.
====================Full response====================
Hello! I'm Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you with tasks like answering questions, generating text (such as stories, official documents, emails, and scripts), performing logical reasoning, and even coding, expressing opinions, and playing games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish.
If you have any questions or need help, just let me know!
====================Token usage====================
{
prompt_tokens: 10,
completion_tokens: 288,
total_tokens: 298,
completion_tokens_details: { reasoning_tokens: 188 },
prompt_tokens_details: { cached_tokens: 0 }
}HTTP
Sample code
curl
# ======= IMPORTANT =======
# API keys are region-specific. Get yours at: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The endpoint URL varies by region. For details, see: https://www.alibabacloud.com/help/en/model-studio/latest/region-selection
# === Delete these comments before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'DashScope
The DashScope API for the Qwen3.5 series uses a multimodal interface. The following examples are incompatible and will return an error. For the correct invocation method, see Enable or disable thinking mode.Python
Sample code
import os
from dashscope import Generation
import dashscope
# The configuration varies by region. Modify this based on your region.
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not set, provide your API key here, e.g., api_key="sk-xxx"
# API keys are region-specific. Get yours at: https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message", # Set result format to 'message'
enable_thinking=True, # Enable the reasoning process
stream=True, # Enable streaming output
incremental_output=True, # Enable incremental output
)
reasoning_content = "" # To store the full reasoning process
answer_content = "" # To store the full response
is_answering = False # Flag to indicate the start of the response phase
print("\n" + "=" * 20 + "Reasoning process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Collect reasoning content
if message.reasoning_content:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# Once content is received, start collecting the response
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
print(chunk.usage)
# After the loop finishes, reasoning_content and answer_content contain the full content.
# You can perform further processing here as needed.
# print(f"\n\nFull reasoning process:\n{reasoning_content}")
# print(f"\nFull response:\n{answer_content}")
Response
====================Reasoning process====================
Hmm, the user asks "Who are you?" I need to determine what they want to know. They might be encountering me for the first time or verifying my identity. First, I should introduce my name, Qwen, and explain that I'm a large-scale language model developed by Tongyi Lab. Next, I might need to explain my functions, such as answering questions, generating text, and coding, so the user understands my purpose. I should also mention that I support multiple languages, so international users know they can communicate in their preferred language. Finally, I'll maintain a friendly tone and invite them to ask questions to encourage further interaction. It is important to use simple, easy-to-understand language and avoid too much technical jargon. The user might have deeper needs, such as testing my abilities or seeking help, so providing specific examples like writing stories, official documents, or emails would be better. I also need to ensure the response structure is clear, outlining my functions, but perhaps a natural transition is better than bullet points. Additionally, I should emphasize that I am an AI assistant without personal consciousness and that all my answers are based on training data to avoid misunderstandings. I might need to check if I have missed any important information, such as my multimodal capabilities or recent updates, but based on previous responses, it is probably not necessary to go into too much detail. In short, the response should be comprehensive yet concise, friendly, and helpful, making the user feel understood and supported.
====================Full response====================
I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you with:
1. **Answering questions**: Whether they are about academic, general knowledge, or domain-specific topics, I will do my best to assist.
2. **Creating text**: I can help you write stories, official documents, emails, scripts, and more.
3. **Logical reasoning**: I can assist you with logical analysis and problem-solving.
4. **Programming**: I understand and can generate code in various programming languages.
5. **Multilingual support**: I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish.
If you have any questions or need help, feel free to let me know!
====================Token usage====================
{"input_tokens": 11, "output_tokens": 405, "total_tokens": 416, "output_tokens_details": {"reasoning_tokens": 256}, "prompt_tokens_details": {"cached_tokens": 0}}Java
Sample code
// DashScope SDK version 2.19.4 or later is required.
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not set, provide your API key here, e.g., .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(Main::handleGenerationResult);
}
public static void main(String[] args) {
try {
// The base_url varies by region.
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Response
====================Reasoning process====================
The user is asking "Who are you?". This is likely an identity query or a capability test. My response should clearly state my identity (Qwen, from Alibaba Group), briefly list my capabilities, mention multilingual support, and maintain a friendly tone to encourage interaction.
====================Full response====================
I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, create text (such as stories, official documents, emails, and scripts), perform logical reasoning, code, express opinions, and even play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me anytime!HTTP
Sample code
curl
# ======= IMPORTANT =======
# API keys are region-specific. Get yours at: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The endpoint URL varies by region. Update it based on your deployment.
# === Delete these comments before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"\"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......
id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Feel free","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"to","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"let me know","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}Additionally, for the open-source Qwen3.5 hybrid thinking models, as well as the qwen-plus-2025-04-28 and qwen-turbo-2025-04-28 models, you can control thinking mode with prompts. When enable_thinking is set to true, you can add /no_think to a prompt to disable thinking mode for that request. To re-enable it in a multi-turn conversation, add /think to the latest prompt. The model follows the most recent /think or /no_think command.
Limit thinking length
Deep thinking models can produce lengthy thinking processes, increasing latency and token consumption. Use thinking_budget to limit thinking tokens. When exceeded, the model generates a response immediately.
The thinking_budget defaults to the model's maximum chain-of-thought length. For details, see the Model list.The thinking_budget parameter is supported by Qwen3 (in thinking mode) and Kimi models.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client.
client = OpenAI(
# If the environment variable is not configured, replace "sk-xxx" with your Model Studio API key.
# API keys are region-specific. To get an API key, visit https://www.alibabacloud.com/help/en/model-studio/get-api-key.
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Configurations vary by region. Modify the base_url according to your region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you"}]
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages,
# The enable_thinking parameter enables the thinking process, and thinking_budget sets its token limit.
extra_body={
"enable_thinking": True,
"thinking_budget": 50
},
stream=True,
stream_options={
"include_usage": True
},
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Tracks if the response phase has started
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Collect only the thinking content.
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# When content is received, the response phase begins.
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.contentResponse
====================Thinking process====================
Okay, the user is asking, "Who are you?" I need to provide a clear and friendly response. First, I should state my identity as Qwen, developed by Tongyi Lab at Alibaba Group. Next, I need to explain my main functions, such as answering
====================Complete response====================
I am Qwen, a large-scale language model developed by Tongyi Lab at Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code.Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client.
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from an environment variable.
// Configurations vary by region. Modify the baseURL according to your region.
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you' }];
const stream = await openai.chat.completions.create({
model: 'qwen-plus',
messages,
stream: true,
// The enable_thinking parameter enables the thinking process, and thinking_budget sets its token limit.
enable_thinking: true,
thinking_budget: 50
});
console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Collect only the thinking content.
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// When content is received, the response phase begins.
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Response
====================Thinking process====================
Okay, the user is asking, "Who are you?" I need to provide a clear and accurate response. First, I should state my identity as Qwen, developed by Tongyi Lab at Alibaba Group. Next, I should explain my main functions, such as answering questions
====================Complete response====================
I am Qwen, a large-scale language model developed by Tongyi Lab at Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code.HTTP
Sample code
curl
# ======= Important =======
# The following is the base URL for the Singapore region. For models in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# For models in the US (Virginia) region, replace the URL with: https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
# === Remove this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true,
"thinking_budget": 50
}'Response
data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
.....
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: [DONE]DashScope
The DashScope API for the Qwen3.5 series uses a multimodal interface. The following example returns a url error. For the correct API call, see Enable or disable thinking mode.Python
Sample code
import os
from dashscope import Generation
import dashscope
# The base_url varies by region. Modify it according to your region.
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=messages,
result_format="message",
enable_thinking=True,
# Sets the token limit for the thinking process.
thinking_budget=50,
stream=True,
incremental_output=True,
)
# Stores the complete thinking process.
reasoning_content = ""
# Stores the complete response.
answer_content = ""
# Tracks if the response phase has started.
is_answering = False
print("=" * 20 + "Thinking process" + "=" * 20)
for chunk in completion:
# Ignore chunks where both thinking content and response content are empty.
if (
chunk.output.choices[0].message.content == ""
and chunk.output.choices[0].message.reasoning_content == ""
):
pass
else:
# If the current chunk contains thinking content.
if (
chunk.output.choices[0].message.reasoning_content != ""
and chunk.output.choices[0].message.content == ""
):
print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If the current chunk contains response content.
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Complete response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="", flush=True)
answer_content += chunk.output.choices[0].message.content
# To print the complete thinking process and response, uncomment and run the following code.
# print("=" * 20 + "Complete thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Response
====================Thinking process====================
Okay, the user is asking, "Who are you?" I need to provide a clear and friendly response. First, I must introduce myself as Qwen, developed by Tongyi Lab at Alibaba Group. Next, I should explain my main functions, such as
====================Complete response====================
I am Qwen, a large-scale language model developed by Tongyi Lab at Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code.Java
Sample code
// The DashScope SDK version must be 2.19.4 or later.
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The base HTTP API URL varies by region. Modify it according to your region.
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.thinkingBudget(50)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Response
====================Thinking process====================
Okay, the user is asking, "Who are you?" I need to provide a clear and friendly response. First, I must introduce myself as Qwen, developed by Tongyi Lab at Alibaba Group. Next, I should explain my main functions, such as
====================Complete response====================
I am Qwen, a large-scale language model developed by Tongyi Lab at Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code.HTTP
Sample code
curl
# ======= Important =======
# API keys are region-specific. To get an API key, visit https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# The endpoint URL varies by region. Modify it according to your region.
# === Remove this comment before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"thinking_budget": 50,
"incremental_output": true,
"result_format": "message"
}
}'Response
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Okay","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"output_tokens":3,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":1}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"output_tokens":4,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":2}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
......
id:133
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
id:134
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}Other features
Billing
Thinking content is billed per output token.
Some hybrid thinking models have different pricing for their thinking and non-thinking modes.
If a model in thinking mode fails to output a thinking process, it is billed at the price for non-thinking mode.
FAQ
Q: How to disable thinking mode
Q: Models supporting non-streaming output
Q: How to purchase tokens after the free quota runs out
Q: Upload images or documents?
Q: How to view token usage and call count?
API
For the input and output parameters, see Qwen.
Error codes
If execution fails, see Error messages.
