Document Center
 
All Products
Search
  • Document Center
  • Alibaba Cloud Model Studio
  • User Guide (Models)
  • Core concepts
  • Deep thinking

This Product

  • This Product
  • All Products

    Alibaba Cloud Model Studio:Deep thinking

    Document Center

    Alibaba Cloud Model Studio:Deep thinking

    Last Updated:Jun 13, 2025

    The Qwen3 and QwQ (based on Qwen2.5) models have powerful reasoning capability. They first output the thinking process, then the response content.

    QwQ Logo
    Qwen

    Model overview

    Qwen3

    Qwen3 supports thinking and non-thinking modes, allowing you to switch between the two using the enable_thinking parameter. In addition to this, the model's capabilities have been significantly enhanced:

    1. Reasoning capability: The model has significantly outperformed QwQ and non-thinking models of the same size in evaluations of mathematics, coding, and logical reasoning, reaching SOTA performance at its size.

    2. Human preference following: Its abilities in creative writing, role-playing, multi-turn conversation, and instruction following have greatly improved, surpassing general capabilities of models of similar size.

    3. Agent capability: The model achieves industry-leading levels in both thinking and non-thinking modes, enabling precise external tool invocation.

    4. Multilingual capability: The model supports over 100 languages and dialects, with marked improvements in multilingual translation, instruction comprehension, and common sense reasoning abilities.

    5. Response format fixes: Previous issues with response formats in earlier versions, such as anomalous Markdown, mid-text truncation, and incorrect boxed outputs, have been fixed.

    When enable_thinking is enabled, there is an extremely small probability that the reasoning content may not be output.
    The thinking mode only supports incremental output.

    Commercial models

    Only the latest and 0428 versions of Qwen-Plus and Qwen-Turbo belong to the Qwen3 series and support the thinking mode.
    By default, the thinking mode is not enabled for the commercial models. You need to set enable_thinking to true first

    Qwen-Plus

    Name

    Version

    Context window

    Maximum input

    Maximum CoT

    Maximum response

    Input price

    Output price

    Free quota

    (Note)

    (Tokens)

    (Million tokens)

    qwen-plus-latest

    Always same performance as the latest snapshot

    Latest

    131,072

    98,304

    38,912

    16,384

    $0.4

    $8

    1 million tokens each

    Valid for 180 days after activation

    qwen-plus-2025-04-28

    Also qwen-plus-0428

    Snapshot

    Qwen-Turbo

    Name

    Version

    Context window

    Maximum input

    Maximum CoT

    Maximum response

    Input price

    Output price

    Free quota

    (Note)

    (Tokens)

    (Million tokens)

    qwen-turbo-latest

    Always same performance as the latest snapshot

    Latest

    131,072

    98,304

    38,912

    16,384

    $0.05

    $1

    1 million tokens each

    Valid for 180 days after activation

    qwen-turbo-2025-04-28

    Also qwen-plus-0428

    Snapshot

    Open source models

    The thinking mode is enabled by default for open source models. To disable it, set enable_thinking to false.
    Open source Qwen3 only supports streaming output in both thinking and non-thinking modes.

    Name

    Context window

    Maximum input

    Maximum CoT

    Maximum response

    Input price

    Output price

    Free quota

    (Note)

    (Tokens)

    (1,000 tokens)

    qwen3-235b-a22b

    131,072

    98,304

    38,912

    16,384

    $0.7

    $8.4

    1 million tokens each

    Valid for 180 days after activation

    qwen3-32b

    qwen3-30b-a3b

    $0.2

    $2.4

    qwen3-14b

    8,192

    $0.35

    $4.2

    qwen3-8b

    $0.18

    $2.1

    qwen3-4b

    $0.11

    $1.26

    qwen3-1.7b

    32,768

    28,672

    30,720

    (CoT+Response)

    qwen3-0.6b

    QwQ (based on Qwen2.5)

    QwQ reasoning model, trained based on Qwen2.5, has made significant improvements in reasoning capabilities by reinforcement learning. Its performance against core mathematic and coding metrics (AIME 24/25, LiveCodeBench) and general metrics (IFEval, LiveBench, etc.) has reached the level of DeepSeek-R1.

    Reasoning cannot be disabled.
    Only streaming output is supported.

    Commercial models

    Name

    Version

    Context window

    Maximum input

    Maximum CoT

    Maximum response

    Input price

    Output price

    Free quota

    (Note)

    (Tokens)

    (Million tokens)

    qwq-plus

    Stable

    131,072

    98,304

    32,768

    8,192

    $0.8

    $2.4

    1 million tokens

    Valid for 180 days after activation

    For information about rate limiting, see Rate limits.

    Get started

    Prerequisites: You must have obtained an API key and configured it as an environment variable. To use the SDKs, you must install OpenAI or DashScope SDK. The DashScope SDK for Java must be version 2.19.4 or later.

    Run the following code to call a deep thinking model in stream mode. Get the thinking process from the returned reasoning_content, and the response from the returned content.

    OpenAI

    Python

    Sample code

    from openai import OpenAI
    import os
    
    # Initialize OpenAI client
    client = OpenAI(
        # If environment variables are not configured, replace with the Model Studio API Key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    
    messages = [{"role": "user", "content": "Who are you"}]
    
    completion = client.chat.completions.create(
        model="qwen-plus-2025-04-28",  # You can replace it with other deep thinking models as needed
        messages=messages,
        # enable_thinking parameter opens the thinking process, this parameter is invalid for QwQ models
        extra_body={"enable_thinking": True},
        stream=True,
        # stream_options={
        #     "include_usage": True
        # },
    )
    
    reasoning_content = ""  # Complete reasoning process
    answer_content = ""  # Complete response
    is_answering = False  # Whether entering the response phase
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    
    for chunk in completion:
        if not chunk.choices:
            print("\nUsage:")
            print(chunk.usage)
            continue
    
        delta = chunk.choices[0].delta
    
        # Only collect reasoning content
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
    
        # Received content, starting to respond
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content
    

    Sample response

    ====================Thinking Process====================
    
    Alright, the user asked "Who are you?" and I need to provide an accurate and friendly response. First, I should confirm my identity: I am Qwen, developed by the Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions, creating text, logical reasoning, etc. At the same time, the tone should remain warm and approachable, avoiding overly technical jargon so that the user feels comfortable. It’s also important not to use complex terms and to ensure the response is concise and clear. Additionally, it might be helpful to include some interactive elements, inviting the user to ask more questions to promote further communication. Lastly, I need to check for any missing key information, such as my name "Qwen", along with my association to Alibaba Group and Tongyi Lab. This ensures the response is comprehensive and meets user expectations.
    
    ====================Complete Response====================
    
    Hello! I am Qwen, a super-large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I can answer questions, create text, perform logical reasoning, programming, and more, aiming to provide users with high-quality information and services. You can call me Qwen. Is there anything I can assist you with?

    Node.js

    Sample code

    import OpenAI from "openai";
    import process from 'process';
    
    // Initialize OpenAI client
    const openai = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variables
        baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
    });
    
    let reasoningContent = '';
    let answerContent = '';
    let isAnswering = false;
    
    async function main() {
        try {
            const messages = [{ role: 'user', content: 'Who are you?' }];
            const stream = await openai.chat.completions.create({
                // You can replace with other Qwen3 models or QwQ models as needed
                model: 'qwen-plus-2025-04-28',
                messages,
                stream: true,
                // The enable_thinking parameter initiates the reasoning process, which is ineffective for QwQ models
                enable_thinking: true
            });
            console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
    
            for await (const chunk of stream) {
                if (!chunk.choices?.length) {
                    console.log('\nUsage:');
                    console.log(chunk.usage);
                    continue;
                }
    
                const delta = chunk.choices[0].delta;
                
                // Only collect reasoning content
                if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                    if (!isAnswering) {
                        process.stdout.write(delta.reasoning_content);
                    }
                    reasoningContent += delta.reasoning_content;
                }
    
                // Receive content, start responding
                if (delta.content !== undefined && delta.content) {
                    if (!isAnswering) {
                        console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                        isAnswering = true;
                    }
                    process.stdout.write(delta.content);
                    answerContent += delta.content;
                }
            }
        } catch (error) {
            console.error('Error:', error);
        }
    }
    
    main();
    

    Sample response

    ====================Thinking Process====================
    
    Okay, the user asked "Who are you," and I need to respond with my identity. Firstly, I should clearly state that I am Qwen, a large-scale language model developed by Alibaba Cloud. Next, I can mention my main functions, such as answering questions, generating text, logical reasoning, etc. It's also important to highlight my multilingual support, including Chinese and English, so the user knows I can handle requests in different languages. Additionally, it might be helpful to explain my application scenarios, such as assistance in learning, work, and daily life. However, the user's question is quite direct, so detailed information may not be necessary; it's best to keep it concise and clear. It's important to maintain a friendly tone and invite further questions from the user. Check if there is any missing crucial information, like my version or the latest updates, but the user might not need such detailed information. Finally, ensure the answer is accurate and free of errors.
    
    ====================Complete Response====================
    
    I am Qwen, a large-scale language model independently developed by Tongyi Lab. I can handle various tasks such as answering questions, generating text, logical reasoning, programming, and supporting multiple languages including Chinese and English. If you have any questions or need help, feel free to let me know at any time!

    HTTP

    Sample code

    curl

    For Qwen3, set enable_thinking to true to enable reasoning. enable_thinking is not effective for QwQ.

    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "messages": [
            {
                "role": "user", 
                "content": "Who are you?"
            }
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        },
        "enable_thinking": true
    }'
    

    Sample response

    data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    .....
    
    data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: [DONE]

    DashScope

    Note

    When use DashScope to call Qwen3:

    • incremental_output must be true.

    • result_format must be "message".

    When use DashScope to call QwQ:

    • incremental_output must be true.

    • result_format defaults to "message".

    Python

    Sample code

    import os
    from dashscope import Generation
    import dashscope
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
    
    messages = [{"role": "user", "content": "Who are you?"}]
    
    completion = Generation.call(
        # If the environment variable is not set, please replace the line below with the Model Studio API Key: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # You may switch to other Deep thinking models as needed
        model="qwen-plus-2025-04-28",
        messages=messages,
        result_format="message",
        # Enable Deep thinking; this parameter is ineffective for QwQ models
        enable_thinking=True,
        stream=True,
        incremental_output=True,
    )
    
    # Define complete reasoning process
    reasoning_content = ""
    # Define complete response
    answer_content = ""
    # Determine whether the reasoning process has finished and response has started
    is_answering = False
    
    print("=" * 20 + "Thinking Process" + "=" * 20)
    
    for chunk in completion:
        # Ignore if both reasoning process and response are empty
        if (
            chunk.output.choices[0].message.content == ""
            and chunk.output.choices[0].message.reasoning_content == ""
        ):
            pass
        else:
            # If currently in reasoning process
            if (
                chunk.output.choices[0].message.reasoning_content != ""
                and chunk.output.choices[0].message.content == ""
            ):
                print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
                reasoning_content += chunk.output.choices[0].message.reasoning_content
            # If currently in response
            elif chunk.output.choices[0].message.content != "":
                if not is_answering:
                    print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
                    is_answering = True
                print(chunk.output.choices[0].message.content, end="", flush=True)
                answer_content += chunk.output.choices[0].message.content
    
    # If you need to print the complete reasoning process and complete response, please uncomment the code below and run it
    # print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
    # print(f"{reasoning_content}")
    # print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
    # print(f"{answer_content}")
    

    Sample response

    ====================Thinking Process====================
    
    Alright, the user asked, "Who are you?" and I need to answer this question. First, I should clarify my identity, namely Qwen, a large-scale language model developed by Alibaba Cloud. Next, I need to explain my functions and purposes, such as answering questions, generating text, logical reasoning, etc. Moreover, I should emphasize my goal of being a helpful assistant to users, providing help and support.
    
    When expressing, I should keep it conversational and avoid technical jargon or complex sentences. Adding some friendly terms, like "Hello there~", can make the conversation more natural. Additionally, it’s important to ensure the information is accurate and doesn't miss key points, such as my developer, main functions, and usage scenarios.
    
    I should also consider possible follow-up questions from the user, like specific application examples or technical details, so I can plant subtle hints in the response to guide further questions. For example, mentioning "Whether it's everyday inquiries or professional domain questions, I'm here to assist," offers a comprehensive yet inviting approach.
    
    Finally, I need to check if the response flows smoothly and doesn’t contain repetitive or redundant information, making sure it's concise and clear. While keeping a balance between friendliness and professionalism, I should ensure the user feels both welcomed and assured.
    
    ====================Complete Response====================
    
    Hello there~ I'm Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, generate text, perform logical reasoning, and even handle programming tasks, aiming to provide help and support to users. Whether it's everyday inquiries or professional domain questions, I'm here to assist. Is there anything I can help you with?

    Java

    Sample code

    // Version of dashscope SDK >= 2.19.4
    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import io.reactivex.Flowable;
    import java.lang.System;
    import com.alibaba.dashscope.utils.Constants;
    
    public class Main {
        static {
            Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
        }
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static StringBuilder reasoningContent = new StringBuilder();
        private static StringBuilder finalContent = new StringBuilder();
        private static boolean isFirstPrint = true;
    
        private static void handleGenerationResult(GenerationResult message) {
            String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
            String content = message.getOutput().getChoices().get(0).getMessage().getContent();
    
            if (!reasoning.isEmpty()) {
                reasoningContent.append(reasoning);
                if (isFirstPrint) {
                    System.out.println("====================Thinking Process====================");
                    isFirstPrint = false;
                }
                System.out.print(reasoning);
            }
    
            if (!content.isEmpty()) {
                finalContent.append(content);
                if (!isFirstPrint) {
                    System.out.println("\n====================Complete Response====================");
                    isFirstPrint = true;
                }
                System.out.print(content);
            }
        }
    
        private static GenerationParam buildGenerationParam(Message userMsg) {
            return GenerationParam.builder()
                    // If the environment variable is not set, please replace the line below with the Model Studio API Key: .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("qwen-plus-2025-04-28")
                    .enableThinking(true)
                    .incrementalOutput(true)
                    .resultFormat("message")
                    .messages(Arrays.asList(userMsg))
                    .build();
        }
    
        public static void streamCallWithMessage(Generation gen, Message userMsg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(userMsg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        public static void main(String[] args) {
            try {
                Generation gen = new Generation();
                Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
                streamCallWithMessage(gen, userMsg);
    //             Print the final result
    //            if (reasoningContent.length() > 0) {
    //                System.out.println("\n====================Complete Response====================");
    //                System.out.println(finalContent.toString());
    //            }
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }
    

    Sample response

    ====================Thinking Process====================
    
    Alright, the user asked "Who are you?" and I need to respond based on previous settings. First and foremost, my role is Qwen, a large-scale language model under Alibaba Group. The answer should be conversational and easy to understand.
    
    The user might be new to interacting with me or wants to confirm my identity. I should first directly state who I am and then briefly explain my functions and purposes, such as answering questions, creating text, programming, etc. It's also important to mention my multilingual support so the user knows I can handle requests in different languages.
    
    Additionally, according to guidelines, I should maintain a personable approach, so the tone should be friendly, possibly using emojis to increase friendliness. Moreover, I might need to guide the user toward further questions or using my features, like asking if they need any help.
    
    It's crucial to avoid complex jargon and lengthy explanations. Check for any missing key points, like multilingual support and specific abilities. Ensure the response meets all requirements, including being conversational and concise.
    
    ====================Complete Response====================
    
    Hello! I'm Qwen, a large-scale language model under Alibaba Group. I can answer questions, create text like writing stories, official documents, emails, scripts, perform logical reasoning, programming, and more. I can also express opinions, play games, etc. I am proficient in multiple languages, including but not limited to Chinese, English, German, French, Spanish, and more. Is there anything I can assist you with?

    HTTP

    Sample code

    curl

    For Qwen3, set enable_thinking to true to enable reasoning. enable_thinking is not effective for QwQ.

    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "input":{
            "messages":[      
                {
                    "role": "user",
                    "content": "Who are you?"
                }
            ]
        },
        "parameters":{
            "enable_thinking": true,
            "incremental_output": true,
            "result_format": "message"
        }
    }'

    Sample response

    id:1
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Well","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:2
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":", ","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:3
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"the user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:4
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:5
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"“","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    ......
    
    id:358
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:359
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":", ","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:360
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"please","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:361
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"tell","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:362
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:363
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:364
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

    Multi-round conversation

    By default, the API does not store your conversation history. The multi-round conversation feature equips the model with the ability to "remember" past interactions, catering to scenarios such as follow-up questions and information gathering. You will receive reasoning_content and content from QwQ. You just need to include content in the context by using {'role': 'assistant', 'content': concatenated streaming output content}. reasoning_content is not required.

    OpenAI

    Implement multi-round conversation through OpenAI SDK or OpenAI-compatible HTTP method.

    Python

    Sample code

    from openai import OpenAI
    import os
    
    # Initialize OpenAI client
    client = OpenAI(
        # If the environment variable is not set, please replace the following with the Model Studio API Key: api_key="sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    )
    
    reasoning_content = ""  # Define complete reasoning process
    answer_content = ""     # Define complete response
    
    messages = []
    conversation_idx = 1
    while True:
        is_answering = False   # Determine whether the reasoning process has finished and response has started
        print("="*20+f"Round {conversation_idx} Conversation"+"="*20)
        conversation_idx += 1
        user_msg = {"role": "user", "content": input("Please enter your message: ")}
        messages.append(user_msg)
        # Create chat completion request
        completion = client.chat.completions.create(
            # You can switch to other deep thinking models as needed
            model="qwen-plus-2025-04-28",
            messages=messages,
            # The enable_thinking parameter initiates the reasoning process, which is ineffective for QwQ models
            extra_body={"enable_thinking": True},
            stream=True,
            # stream_options={
            #     "include_usage": True
            # }
        )
        print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
        for chunk in completion:
            # If chunk.choices is empty, print usage
            if not chunk.choices:
                print("\nUsage:")
                print(chunk.usage)
            else:
                delta = chunk.choices[0].delta
                # Print reasoning process
                if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
                    print(delta.reasoning_content, end='', flush=True)
                    reasoning_content += delta.reasoning_content
                else:
                    # Start responding
                    if delta.content != "" and is_answering is False:
                        print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                        is_answering = True
                    # Print response process
                    print(delta.content, end='', flush=True)
                    answer_content += delta.content
        # Add the model's response content to the context
        messages.append({"role": "assistant", "content": answer_content})
        print("\n")

    Node.js

    Sample code

    ```javascript
    import OpenAI from "openai";
    import process from 'process';
    import readline from 'readline/promises';
    
    // Initialize readline interface
    const rl = readline.createInterface({
        input: process.stdin,
        output: process.stdout
    });
    
    // Initialize OpenAI client
    const openai = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY, // Retrieve from environment variables
        baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
    });
    
    let reasoningContent = '';
    let answerContent = '';
    let isAnswering = false;
    let messages = [];
    let conversationIdx = 1;
    
    async function main() {
        while (true) {
            console.log("=".repeat(20) + `Round ${conversationIdx} Conversation` + "=".repeat(20));
            conversationIdx++;
            
            // Read user input
            const userInput = await rl.question("Please enter your message: ");
            messages.push({ role: 'user', content: userInput });
    
            // Reset states
            reasoningContent = '';
            answerContent = '';
            isAnswering = false;
    
            try {
                const stream = await openai.chat.completions.create({
                    // You can switch to other deep thinking models as needed
                    model: 'qwen-plus-2025-04-28',
                    messages: messages,
                    // The enable_thinking parameter initiates the reasoning process, which is ineffective for QwQ models
                    enable_thinking: true,
                    stream: true,
                    // stream_options:{
                    //     include_usage: true
                    // }
                });
    
                console.log("\n" + "=".repeat(20) + "Thinking Process" + "=".repeat(20) + "\n");
    
                for await (const chunk of stream) {
                    if (!chunk.choices?.length) {
                        console.log('\nUsage:');
                        console.log(chunk.usage);
                        continue;
                    }
    
                    const delta = chunk.choices[0].delta;
                    
                    // Handle reasoning process
                    if (delta.reasoning_content) {
                        process.stdout.write(delta.reasoning_content);
                        reasoningContent += delta.reasoning_content;
                    }
                    
                    // Handle formal response
                    if (delta.content) {
                        if (!isAnswering) {
                            console.log('\n' + "=".repeat(20) + "Complete Response" + "=".repeat(20) + "\n");
                            isAnswering = true;
                        }
                        process.stdout.write(delta.content);
                        answerContent += delta.content;
                    }
                }
                
                // Add the complete response to the message history
                messages.push({ role: 'assistant', content: answerContent });
                console.log("\n");
                
            } catch (error) {
                console.error('Error:', error);
            }
        }
    }
    
    // Start the program
    main().catch(console.error);

    HTTP

    Sample code

    curl

    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "messages": [
            {
                "role": "user", 
                "content": "Hello"
            },
            {
                "role": "assistant",
                "content": "Hello! Nice to meet you, how can I help you?"
            },
            {
                "role": "user",
                "content": "Who are you?"
            }
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        },
        "enable_thinking": true
    }'

    DashScope

    Implement multi-round conversation through DashScope SDK or HTTP method.

    Python

    Sample code

    import os
    import dashscope
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
    
    messages = []
    conversation_idx = 1
    while True:
        print("=" * 20 + f"Round {conversation_idx} Conversation" + "=" * 20)
        conversation_idx += 1
        user_msg = {"role": "user", "content": input("Please enter your message: ")}
        messages.append(user_msg)
        response = dashscope.Generation.call(
            # If the environment variable is not set, please replace the following with the Model Studio API Key: api_key="sk-xxx",
            api_key=os.getenv('DASHSCOPE_API_KEY'),
            # qwen-plus-2025-04-28 is used as an example here; you can switch to other deep thinking models as needed
            model="qwen-plus-2025-04-28", 
            messages=messages,
            # The enable_thinking parameter initiates the reasoning process, which is ineffective for QwQ models
            enable_thinking=True,
            result_format="message",
            stream=True,
            incremental_output=True
        )
        # Define complete reasoning process
        reasoning_content = ""
        # Define complete response
        answer_content = ""
        # Determine whether the reasoning process has finished and response has started
        is_answering = False
        print("=" * 20 + "Thinking Process" + "=" * 20)
        for chunk in response:
            # Ignore if both reasoning process and response are empty
            if (chunk.output.choices[0].message.content == "" and 
                chunk.output.choices[0].message.reasoning_content == ""):
                pass
            else:
                # If currently in reasoning process
                if (chunk.output.choices[0].message.reasoning_content != "" and 
                    chunk.output.choices[0].message.content == ""):
                    print(chunk.output.choices[0].message.reasoning_content, end="",flush=True)
                    reasoning_content += chunk.output.choices[0].message.reasoning_content
                # If currently in response
                elif chunk.output.choices[0].message.content != "":
                    if not is_answering:
                        print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
                        is_answering = True
                    print(chunk.output.choices[0].message.content, end="",flush=True)
                    answer_content += chunk.output.choices[0].message.content
        # Add the model's response content to the context
        messages.append({"role": "assistant", "content": answer_content})
        print("\n")
        # If you need to print the complete reasoning process and complete response, please uncomment the code below and run it
        # print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
        # print(f"{reasoning_content}")
        # print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
        # print(f"{answer_content}")

    Java

    Sample code

    // Version of dashscope SDK >= 2.19.4
    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import io.reactivex.Flowable;
    import java.lang.System;
    import java.util.List;
    import com.alibaba.dashscope.protocol.Protocol;
    
    
    public class Main {
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static StringBuilder reasoningContent = new StringBuilder();
        private static StringBuilder finalContent = new StringBuilder();
        private static boolean isFirstPrint = true;
    
        private static void handleGenerationResult(GenerationResult message) {
            String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
            String content = message.getOutput().getChoices().get(0).getMessage().getContent();
    
            if (!reasoning.isEmpty()) {
                reasoningContent.append(reasoning);
                if (isFirstPrint) {
                    System.out.println("====================Thinking Process====================");
                    isFirstPrint = false;
                }
                System.out.print(reasoning);
            }
    
            if (!content.isEmpty()) {
                finalContent.append(content);
                if (!isFirstPrint) {
                    System.out.println("\n====================Complete Response====================");
                    isFirstPrint = true;
                }
                System.out.print(content);
            }
        }
        private static GenerationParam buildGenerationParam(List Msg) {
            return GenerationParam.builder()
                    // If the environment variable is not set, please replace the following with the Model Studio API Key: .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    // qwen-plus-2025-04-28 is used as an example here; you can switch to other model names as needed
                    .model("qwen-plus-2025-04-28")
                    .enableThinking(true)
                    .messages(Msg)
                    .incrementalOutput(true)
                    .build();
        }
        public static void streamCallWithMessage(Generation gen, List Msg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(Msg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        public static void main(String[] args) {
            try {
                Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
                Message userMsg1 = Message.builder()
                        .role(Role.USER.getValue())
                        .content("Hello")
                        .build();
                Message AssistantMsg = Message.builder()
                        .role(Role.ASSISTANT.getValue())
                        .content("Hello! Nice to meet you, is there anything I can assist you with?")
                        .build();
                Message userMsg2 = Message.builder()
                        .role(Role.USER.getValue())
                        .content("Who are you?")
                        .build();
                List Msg = Arrays.asList(userMsg1, AssistantMsg, userMsg2);
                streamCallWithMessage(gen, Msg);
    //             Print the final result
    //            if (reasoningContent.length() > 0) {
    //                System.out.println("\n====================Complete Response====================");
    //                System.out.println(finalContent.toString());
    //            }
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }

    HTTP

    Sample code

    curl

    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "input":{
            "messages":[      
                {
                    "role": "user",
                    "content": "Hello"
                },
                {
                    "role": "assistant",
                    "content": "Hello! Nice to meet you, how can I help you?"
                },
                {
                    "role": "user",
                    "content": "Who are you?"
                }
            ]
        },
        "parameters":{
            "enable_thinking": true,
            "incremental_output": true,
            "result_format": "message"
        }
    }'

    Limit thinking length

    Deep thinking models may sometimes produce lengthy reasoning processes, resulting in long wait time and high token consumption. To solve this, set thinking_budget to limit the length of the reasoning process.

    If the number of reasoning tokens exceeds thinking_budget, the reasoning content will be truncated, and the final response will begin immediately.
    Only Qwen3 supports this parameter.

    OpenAI

    Python

    Sample code

    from openai import OpenAI
    import os
    
    # Initialize OpenAI client
    client = OpenAI(
        # If the environment variable is not set, please replace the following with the Model Studio API Key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    
    messages = [{"role": "user", "content": "Who are you?"}]
    
    completion = client.chat.completions.create(
        model="qwen-plus-2025-04-28",  # You can switch to other deep thinking models as needed
        messages=messages,
        # The enable_thinking parameter initiates the reasoning process, and the thinking_budget parameter sets the maximum number of tokens for the reasoning process. Both parameters are ineffective for QwQ models.
        extra_body={
            "enable_thinking": True,
            "thinking_budget": 50
        },
        stream=True,
        # stream_options={
        #     "include_usage": True
        # },
    )
    
    reasoning_content = ""  # Complete reasoning process
    answer_content = ""  # Complete response
    is_answering = False  # Whether entering the response phase
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    
    for chunk in completion:
        if not chunk.choices:
            print("\nUsage:")
            print(chunk.usage)
            continue
    
        delta = chunk.choices[0].delta
    
        # Collect only reasoning content
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
    
        # Receive content and start responding
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content

    Sample response

    ====================Thinking Process====================
    
    Alright, the user asked "Who are you," and I need to provide a clear and friendly response. First, I should clarify my identity as Qwen, developed by Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions, generating text, logical reasoning, etc., aimed at helping and providing convenience to users.
    
    ====================Complete Response====================
    
    I am Qwen, a large-scale language model developed by Tongyi Lab under Alibaba Group. I am capable of answering questions, generating text, performing logical reasoning, programming, and more, all aimed at providing help and convenience to users. Is there anything I can assist you with?

    Node.js

    Sample code

    import OpenAI from "openai";
    import process from 'process';
    
    // Initialize OpenAI client
    const openai = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY, // Retrieve from environment variables
        baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
    });
    
    let reasoningContent = '';
    let answerContent = '';
    let isAnswering = false;
    
    async function main() {
        try {
            const messages = [{ role: 'user', content: 'Who are you?' }];
            const stream = await openai.chat.completions.create({
                // qwen-plus-2025-04-28 is used as an example here; you can switch to other deep thinking models as needed
                model: 'qwen-plus-2025-04-28',
                messages,
                stream: true,
                // The enable_thinking parameter initiates the reasoning process, and the thinking_budget parameter sets the maximum number of tokens for the reasoning process. Both parameters are ineffective for QwQ models.
                enable_thinking: true,
                thinking_budget: 50
            });
            console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
    
            for await (const chunk of stream) {
                if (!chunk.choices?.length) {
                    console.log('\nUsage:');
                    console.log(chunk.usage);
                    continue;
                }
    
                const delta = chunk.choices[0].delta;
                
                // Collect only reasoning content
                if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                    if (!isAnswering) {
                        process.stdout.write(delta.reasoning_content);
                    }
                    reasoningContent += delta.reasoning_content;
                }
    
                // Receive content and start responding
                if (delta.content !== undefined && delta.content) {
                    if (!isAnswering) {
                        console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                        isAnswering = true;
                    }
                    process.stdout.write(delta.content);
                    answerContent += delta.content;
                }
            }
        } catch (error) {
            console.error('Error:', error);
        }
    }
    
    main();
    

    Sample response

    ====================Thinking Process====================
    
    Alright, the user asked "Who are you?" and I need to provide a clear and accurate response. First, I should introduce my identity as Qwen, developed by Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions.
    
    ====================Complete Response====================
    
    I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I am capable of answering questions, generating text, performing logical reasoning, programming, and handling various tasks. If you have any questions or need assistance, feel free to let me know anytime!

    HTTP

    Sample code

    curl

    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "messages": [
            {
                "role": "user", 
                "content": "Who are you"
            }
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        },
        "enable_thinking": true,
        "thinking_budget": 50
    }'

    Sample response

    data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    .....
    
    data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: [DONE]

    DashScope

    Python

    Sample code

    import os
    from dashscope import Generation
    import dashscope
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
    
    messages = [{"role": "user", "content": "Who are you?"}]
    
    completion = Generation.call(
        # If the environment variable is not set, please replace the following with the Model Studio API Key: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # You can switch to other deep thinking models as needed
        model="qwen-plus-2025-04-28",
        messages=messages,
        result_format="message",
        # Enable deep thinking; this parameter is ineffective for QwQ models
        enable_thinking=True,
        # Set the maximum number of tokens for the reasoning process; this parameter is ineffective for QwQ models
        thinking_budget=50,
        stream=True,
        incremental_output=True,
    )
    
    # Define complete reasoning process
    reasoning_content = ""
    # Define complete response
    answer_content = ""
    # Determine whether the reasoning process has finished and response has started
    is_answering = False
    
    print("=" * 20 + "Thinking Process" + "=" * 20)
    
    for chunk in completion:
        # Ignore if both reasoning process and response are empty
        if (
            chunk.output.choices[0].message.content == ""
            and chunk.output.choices[0].message.reasoning_content == ""
        ):
            pass
        else:
            # If currently in reasoning process
            if (
                chunk.output.choices[0].message.reasoning_content != ""
                and chunk.output.choices[0].message.content == ""
            ):
                print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
                reasoning_content += chunk.output.choices[0].message.reasoning_content
            # If currently in response
            elif chunk.output.choices[0].message.content != "":
                if not is_answering:
                    print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
                    is_answering = True
                print(chunk.output.choices[0].message.content, end="", flush=True)
                answer_content += chunk.output.choices[0].message.content
    
    # If you need to print the complete reasoning process and complete response, please uncomment the code below and run it
    # print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
    # print(f"{reasoning_content}")
    # print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
    # print(f"{answer_content}")
    

    Sample response

    ====================Thinking Process====================
    
    Alright, the user asked "Who are you?" and I need to provide a clear and friendly response. First, I should introduce my identity, namely Qwen, developed by Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions.
    
    ====================Complete Response====================
    
    I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I am capable of answering questions, generating text, performing logical reasoning, programming, and more, aiming to provide comprehensive, accurate, and useful information and assistance to users. Is there anything I can help you with?

    Java

    Sample code

    // Version of dashscope SDK >= 2.19.4
    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import io.reactivex.Flowable;
    import java.lang.System;
    import com.alibaba.dashscope.utils.Constants;
    
    public class Main {
        static {
            Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
        }
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static StringBuilder reasoningContent = new StringBuilder();
        private static StringBuilder finalContent = new StringBuilder();
        private static boolean isFirstPrint = true;
    
        private static void handleGenerationResult(GenerationResult message) {
            String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
            String content = message.getOutput().getChoices().get(0).getMessage().getContent();
    
            if (!reasoning.isEmpty()) {
                reasoningContent.append(reasoning);
                if (isFirstPrint) {
                    System.out.println("====================Thinking Process====================");
                    isFirstPrint = false;
                }
                System.out.print(reasoning);
            }
    
            if (!content.isEmpty()) {
                finalContent.append(content);
                if (!isFirstPrint) {
                    System.out.println("\n====================Complete Response====================");
                    isFirstPrint = true;
                }
                System.out.print(content);
            }
        }
        private static GenerationParam buildGenerationParam(Message userMsg) {
            return GenerationParam.builder()
                    // If the environment variable is not set, please replace the following with the Model Studio API Key: .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("qwen-plus-2025-04-28")
                    .enableThinking(true)
                    .thinkingBudget(50)
                    .incrementalOutput(true)
                    .resultFormat("message")
                    .messages(Arrays.asList(userMsg))
                    .build();
        }
        public static void streamCallWithMessage(Generation gen, Message userMsg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(userMsg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        public static void main(String[] args) {
            try {
                Generation gen = new Generation();
                Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
                streamCallWithMessage(gen, userMsg);
    //             Print the final result
    //            if (reasoningContent.length() > 0) {
    //                System.out.println("\n====================Complete Response====================");
    //                System.out.println(finalContent.toString());
    //            }
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }
    

    Sample response

    ====================Thinking Process====================
    
    Alright, the user asked "Who are you?" and I need to provide a clear and friendly response. First, I should introduce my identity, namely Qwen, developed by Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions, generating text, logical reasoning, programming, etc., to offer comprehensive, accurate, and helpful information and assistance to users.
    
    ====================Complete Response====================
    
    I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I am capable of answering questions, generating text, performing logical reasoning, programming, and more, aiming to provide comprehensive, accurate, and useful information and assistance to users. Is there anything I can help you with?

    HTTP

    Sample code

    curl

    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "input":{
            "messages":[      
                {
                    "role": "user",
                    "content": "Who are you?"
                }
            ]
        },
        "parameters":{
            "enable_thinking": true,
            "thinking_budget": 50,
            "incremental_output": true,
            "result_format": "message"
        }
    }'

    Sample response

    id:1
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Well","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"output_tokens":3,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":1}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
    
    id:2
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":", ","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"output_tokens":4,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":2}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
    
    ......
    
    id:133
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
    
    id:134
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}

    Function calling

    Despite their reasoning capability, the deep thinking models cannot interact with the outside world. Function calling introduces foreign tools that help the model to perform tasks like weather queries, database queries, and sending emails.

    After completing the thinking process, the Qwen3 and QwQ models will output tool calling information. The tool_choice parameter can only be set to "auto" (default value, meaning the model selects tools on itself) or "none" (forcing the model not to select any tools).

    OpenAI

    Python

    Sample code

    import os
    from openai import OpenAI
    
    # Initialize OpenAI client, configuring Alibaba Cloud Model Studio Service
    client = OpenAI(
        # If the environment variable is not set, please replace the following with the Model Studio API Key: api_key="sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # Read API key from environment variable
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    
    # Define available tools list
    tools = [
        # Tool 1: Get the current time
        {
            "type": "function",
            "function": {
                "name": "get_current_time",
                "description": "Useful for knowing the current time.",
                "parameters": {}  # No parameters needed
            }
        },  
        # Tool 2: Get the weather of a specified city
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Useful for querying the weather of a specified city.",
                "parameters": {  
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City or district, e.g., Beijing, Hangzhou, Yuhang District, etc."
                        }
                    },
                    "required": ["location"]  # Required parameter
                }
            }
        }
    ]
    
    messages = [{"role": "user", "content": input("Please enter your question: ")}]
    completion = client.chat.completions.create(
        # qwen-plus-2025-04-28 is used as an example here; you can switch to other deep thinking models
        model="qwen-plus-2025-04-28",
        messages=messages,
        extra_body={
            # Enable deep thinking; this parameter is ineffective for QwQ models
            "enable_thinking": True
        },
        tools=tools,
        parallel_tool_calls=True,
        stream=True,
        # Uncomment if you want to retrieve token consumption information
        # stream_options={
        #     "include_usage": True
        # }
    )
    
    reasoning_content = ""  # Define complete reasoning process
    answer_content = ""     # Define complete response
    tool_info = []          # Store tool invocation information
    is_answering = False   # Determine whether the reasoning process has finished and response has started
    print("="*20+"Thinking Process"+"="*20)
    for chunk in completion:
        if not chunk.choices:
            # Handle usage information
            print("\n"+"="*20+"Usage"+"="*20)
            print(chunk.usage)
        else:
            delta = chunk.choices[0].delta
            # Handle AI's thought process (chain reasoning)
            if hasattr(delta, 'reasoning_content') and delta.reasoning_content is not None:
                reasoning_content += delta.reasoning_content
                print(delta.reasoning_content,end="",flush=True)  # Real-time output of the thought process
                
            # Handle final response content
            else:
                if not is_answering:  # Print title when entering the response phase for the first time
                    is_answering = True
                    print("\n"+"="*20+"Response Content"+"="*20)
                if delta.content is not None:
                    answer_content += delta.content
                    print(delta.content,end="",flush=True)  # Stream output of response content
                
                # Handle tool invocation information (support parallel tool calls)
                if delta.tool_calls is not None:
                    for tool_call in delta.tool_calls:
                        index = tool_call.index  # Tool call index, used for parallel calls
                        
                        # Dynamically expand tool information storage list
                        while len(tool_info) <= index:
                            tool_info.append({})
                        
                        # Collect tool call ID (used for subsequent function calls)
                        if tool_call.id:
                            tool_info[index]['id'] = tool_info[index].get('id', '') + tool_call.id
                        
                        # Collect function name (used for subsequent routing to specific functions)
                        if tool_call.function and tool_call.function.name:
                            tool_info[index]['name'] = tool_info[index].get('name', '') + tool_call.function.name
                        
                        # Collect function parameters (in JSON string format, need subsequent parsing)
                        if tool_call.function and tool_call.function.arguments:
                            tool_info[index]['arguments'] = tool_info[index].get('arguments', '') + tool_call.function.arguments
                
    print(f"\n"+"="*19+"Tool Invocation Information"+"="*19)
    if not tool_info:
        print("No tool invocation")
    else:
        print(tool_info)
    

    Sample response

    Enter "weather of the four municipalities".

    ====================Thinking Process====================
    
    Alright, the user asked about the "weather of the four municipalities." First, I need to clarify which four municipalities these are. According to China's administrative regions, the municipalities are Beijing, Shanghai, Tianjin, and Chongqing. Therefore, the user wants to know the weather conditions of these four cities.
    
    Next, I need to check the available tools. Among the provided tools, there is the `get_current_weather` function, with the parameter `location` being a string type. Each city must be queried individually, as the function can check only one location at a time. Thus, I need to call this function once for each municipality.
    
    Then, I need to consider how to generate the correct tool invocation. Each call should include the city name as a parameter. For example, the first call is for Beijing, the second is for Shanghai, and so on. Ensure that the parameter name is `location` and the value is the correct city name.
    
    Additionally, the user might want the weather information for each city, so it is important to ensure that each function call is correct and flawless. It might require calling four times consecutively, once for each city. However, according to the tool usage rules, it may need processing in multiple steps, or generating multiple calls at once. But according to the example, perhaps only one function is called at a time, so it might need to be done gradually.
    
    Finally, confirm if there are any other factors to consider, such as the correctness of parameters, the accuracy of city names, and whether potential errors, like non-existent cities or unavailable API, need handling. But for now, the four municipalities are clear, and there shouldn't be any issues.
    
    ====================Complete Response====================
    
    ===================Tool Invocation Information===================
    
    [{'id': 'call_767af2834c12488a8fe6e3', 'name': 'get_current_weather', 'arguments': '{"location": "Beijing"}'}, {'id': 'call_2cb05a349c89437a947ada', 'name': 'get_current_weather', 'arguments': '{"location": "Shanghai"}'}, {'id': 'call_988dd180b2ca4b0a864ea7', 'name': 'get_current_weather', 'arguments': '{"location": "Tianjin"}'}, {'id': 'call_4e98c57ea96a40dba26d12', 'name': 'get_current_weather', 'arguments': '{"location": "Chongqing"}'}]

    Node.js

    Sample code

    import OpenAI from "openai";
    import readline from 'node:readline/promises';
    import { stdin as input, stdout as output } from 'node:process';
    
    const openai = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    });
    
    const tools = [
        {
            type: "function",
            function: {
                name: "get_current_time",
                description: "Useful for knowing the current time.",
                parameters: {}
            }
        },
        {
            type: "function",
            function: {
                name: "get_current_weather",
                description: "Useful for querying the weather of a specified city.",
                parameters: {
                    type: "object",
                    properties: {
                        location: {
                            type: "string",
                            description: "City or district, e.g., Beijing, Hangzhou, Yuhang District, etc."
                        }
                    },
                    required: ["location"]
                }
            }
        }
    ];
    
    async function main() {
        const rl = readline.createInterface({ input, output });
        const question = await rl.question("Please enter your question: "); 
        rl.close();
        
        const messages = [{ role: "user", content: question }];
        
        let reasoningContent = "";
        let answerContent = "";
        const toolInfo = [];
        let isAnswering = false;
    
        console.log("=".repeat(20) + "Thinking Process" + "=".repeat(20));
        
        try {
            const stream = await openai.chat.completions.create({
                // qwen-plus-2025-04-28 is used as an example here; you can switch to other deep thinking models
                model: "qwen-plus-2025-04-28",
                messages,
                // Enable deep thinking; this parameter is ineffective for QwQ models
                enable_thinking: true,
                tools,
                stream: true,
                parallel_tool_calls: true
            });
    
            for await (const chunk of stream) {
                if (!chunk.choices?.length) {
                    console.log("\n" + "=".repeat(20) + "Usage" + "=".repeat(20));
                    console.log(chunk.usage);
                    continue;
                }
    
                const delta = chunk.choices[0]?.delta;
                if (!delta) continue;
    
                // Handle thought process
                if (delta.reasoning_content) {
                    reasoningContent += delta.reasoning_content;
                    process.stdout.write(delta.reasoning_content);
                }
                // Handle response content
                else {
                    if (!isAnswering) {
                        isAnswering = true;
                        console.log("\n" + "=".repeat(20) + "Response Content" + "=".repeat(20));
                    }
                    if (delta.content) {
                        answerContent += delta.content;
                        process.stdout.write(delta.content);
                    }
                    // Handle tool invocation
                    if (delta.tool_calls) {
                        for (const toolCall of delta.tool_calls) {
                            const index = toolCall.index;
                            
                            // Ensure array length is sufficient
                            while (toolInfo.length <= index) {
                                toolInfo.push({});
                            }
                            
                            // Update tool ID
                            if (toolCall.id) {
                                toolInfo[index].id = (toolInfo[index].id || "") + toolCall.id;
                            }
                            
                            // Update function name
                            if (toolCall.function?.name) {
                                toolInfo[index].name = (toolInfo[index].name || "") + toolCall.function.name;
                            }
                            
                            // Update parameters
                            if (toolCall.function?.arguments) {
                                toolInfo[index].arguments = (toolInfo[index].arguments || "") + toolCall.function.arguments;
                            }
                        }
                    }
                }
            }
    
            console.log("\n" + "=".repeat(19) + "Tool Invocation Information" + "=".repeat(19));
            console.log(toolInfo.length ? toolInfo : "No tool invocation");
    
        } catch (error) {
            console.error("Error occurred:", error);
        }
    }
    
    main(); 
    

    Sample response

    Enter "weather of the four municipalities".

    Please enter your question: weather of the four municipalities
    ====================Thinking Process====================
    
    Alright, the user asked about the weather in the four municipalities. First, I need to clarify which these four municipalities are in China. They are Beijing, Shanghai, Tianjin, and Chongqing, right? Next, I need to call the weather query function for each city.
    
    The user's question likely requires me to separately obtain the weather for these four cities. Each city requires calling the get_current_weather function, with the parameter being the city's name. I need to ensure the parameters are correct, such as the full names of the municipalities, like "Beijing", "Shanghai", "Tianjin", and "Chongqing."
    
    Then, I need to call the weather API for these four cities sequentially. Each call needs a separate tool_call. The user likely wants the current weather for each city, so each call must be accurate and correct. Careful attention to the correct spelling and names of each city is necessary to avoid errors. For example, Chongqing might be abbreviated as "Chongqing" sometimes, so using the full name is recommended in the parameters.
    
    Now, I need to generate four tool_calls, each corresponding to a municipality. Check the correctness of each parameter and arrange them in order. This way, the user will receive weather data for the four municipalities.
    
    ====================Complete Response====================
    
    ===================Tool Invocation Information===================
    
    json
    [
      {
        "id": "call_21dc802e717f491298d1b2",
        "name": "get_current_weather",
        "arguments": "{\"location\": \"Beijing\"}"
      },
      {
        "id": "call_2cd3be1d2f694c4eafd4e5",
        "name": "get_current_weather",
        "arguments": "{\"location\": \"Shanghai\"}"
      },
      {
        "id": "call_48cf3f78e02940bd9085e4",
        "name": "get_current_weather",
        "arguments": "{\"location\": \"Tianjin\"}"
      },
      {
        "id": "call_e230a2b4c64f4e658d223e",
        "name": "get_current_weather",
        "arguments": "{\"location\": \"Chongqing\"}"
      }
    ]

    HTTP

    Sample code

    curl

    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "messages": [
            {
                "role": "user", 
                "content": "How is the weather in Hangzhou?"
            }
        ],
        "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_time",
                "description": "Useful for knowing the current time.",
                "parameters": {}
            }
        },
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Useful for querying the weather of a specified city.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location":{
                            "type": "string",
                            "description": "City or district, e.g., Beijing, Hangzhou, Yuhang District, etc."
                        }
                    },
                    "required": ["location"]
                }
            }
        }
      ],
      "enable_thinking": true,
      "stream": true
    }'
    

    DashScope

    Python

    Sample code

    import dashscope
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
    tools = [
        # Tool 1: Get the current time
        {
            "type": "function",
            "function": {
                "name": "get_current_time",
                "description": "Useful for knowing the current time.",
                "parameters": {}  # No parameters needed since obtaining current time doesn't require input
            }
        },  
        # Tool 2: Get the weather for a specified city
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Useful for querying the weather of a specified city.",
                "parameters": {  
                    "type": "object",
                    "properties": {
                        # Location must be provided when querying weather, hence parameter is set as location
                        "location": {
                            "type": "string",
                            "description": "City or district, e.g., Beijing, Hangzhou, Yuhang District, etc."
                        }
                    },
                    "required": ["location"]
                }
            }
        }
    ]
    
    # Define question
    messages = [{"role": "user", "content": input("Please enter your question: ")}]
    completion = dashscope.Generation.call(
        # qwen-plus-2025-04-28 is used as an example here; you can switch to other deep thinking models
        model="qwen-plus-2025-04-28", 
        messages=messages,
        enable_thinking=True,
        tools=tools,
        parallel_tool_calls=True,
        stream=True,
        incremental_output=True,
        result_format="message"
    )
    
    reasoning_content = ""
    answer_content = ""
    tool_info = []
    is_answering = False
    print("="*20+"Thinking Process"+"="*20)
    
    for chunk in completion:
        if chunk.status_code == 200:
            msg = chunk.output.choices[0].message
            
            # Handle thought process
            if 'reasoning_content' in msg and msg.reasoning_content:
                reasoning_content += msg.reasoning_content
                print(msg.reasoning_content, end="", flush=True)
            
            # Handle response content
            if 'content' in msg and msg.content:
                if not is_answering:
                    is_answering = True
                    print("\n"+"="*20+"Response Content"+"="*20)
                answer_content += msg.content
                print(msg.content, end="", flush=True)
            
            # Handle tool invocation
            if 'tool_calls' in msg and msg.tool_calls:
                for tool_call in msg.tool_calls:
                    index = tool_call['index']
                    
                    while len(tool_info) <= index:
                        tool_info.append({'id': '', 'name': '', 'arguments': ''})  # Initialize all fields
                    
                    # Incrementally update tool ID
                    if 'id' in tool_call:
                        tool_info[index]['id'] += tool_call.get('id', '')
                    
                    # Incrementally update function information
                    if 'function' in tool_call:
                        func = tool_call['function']
                        # Incrementally update function name
                        if 'name' in func:
                            tool_info[index]['name'] += func.get('name', '')
                        # Incrementally update parameters
                        if 'arguments' in func:
                            tool_info[index]['arguments'] += func.get('arguments', '')
    
    print(f"\n"+"="*19+"Tool Invocation Information"+"="*19)
    if not tool_info:
        print("No tool invocation")
    else:
        print(tool_info)
    

    Sample response

    Enter "weather of the four municipalities".

    Please enter your question: weather of the four municipalities
    ====================Thinking Process====================
    
    Alright, the user asked about the weather in the four municipalities. First, I need to confirm which these municipalities are in China: Beijing, Shanghai, Tianjin, and Chongqing, right? Next, the user needs the weather information for each city, so I need to call the weather query function.
    
    However, the issue is that the user didn't specify the exact city names, just mentioned the four municipalities. I might need to specify each municipality's name and query them separately. For instance, Beijing, Shanghai, Tianjin, and Chongqing are the four municipalities I need to confirm are correctly identified.
    
    Next, I need to check the available tools; the user's provided function is `get_current_weather`, with `location` as the parameter. Therefore, I need to call this function for each municipality, providing the corresponding city name as the parameter. For example, the first call will have `location` set to Beijing, the second to Shanghai, the third to Tianjin, and the fourth to Chongqing.
    
    It might be necessary to note that sometimes municipalities like Chongqing might require more specific districts, but the user might only need city-level weather information. Using the municipality's name directly should be fine for this task. Following that, I need to generate four separate function calls, each corresponding to one municipality, so the user will receive weather information for the four cities.
    
    Finally, ensure each call's parameter is correct and nothing is missed. This way, the user's query will receive a complete response.
    
    ===================Tool Invocation Information===================
    
    [{'id': 'call_2f774ed97b0e4b24ab10ec', 'name': 'get_current_weather', 'arguments': '{"location": "Beijing"}'}, {'id': 'call_dc3b05b88baa48c58bc33a', 'name': 'get_current_weather', 'arguments': '{"location": "Shanghai"}'}, {'id': 'call_249b2de2f73340cdb46cbc', 'name': 'get_current_weather', 'arguments': '{"location": "Tianjin"}'}, {'id': 'call_833333634fda49d1b39e87', 'name': 'get_current_weather', 'arguments': '{"location": "Chongqing"}'}]
    

    Java

    Sample code

    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.alibaba.dashscope.utils.JsonUtils;
    import com.alibaba.dashscope.tools.ToolFunction;
    import com.alibaba.dashscope.tools.FunctionDefinition;
    import io.reactivex.Flowable;
    import com.fasterxml.jackson.databind.node.ObjectNode;
    import java.lang.System;
    import com.github.victools.jsonschema.generator.Option;
    import com.github.victools.jsonschema.generator.OptionPreset;
    import com.github.victools.jsonschema.generator.SchemaGenerator;
    import com.github.victools.jsonschema.generator.SchemaGeneratorConfig;
    import com.github.victools.jsonschema.generator.SchemaGeneratorConfigBuilder;
    import com.github.victools.jsonschema.generator.SchemaVersion;
    import java.time.LocalDateTime;
    import java.time.format.DateTimeFormatter;
    import com.alibaba.dashscope.utils.Constants;
    
    public class Main {
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static ObjectNode jsonSchemaWeather;
        private static ObjectNode jsonSchemaTime;
        static {
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        }
    
        static class TimeTool {
            public String call() {
                LocalDateTime now = LocalDateTime.now();
                DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
                return "Current time: " + now.format(formatter) + ".";
            }
        }
    
        static class WeatherTool {
            private String location;
    
            public WeatherTool(String location) {
                this.location = location;
            }
    
            public String call() {
                return location + " is sunny today.";
            }
        }
    
        static {
            SchemaGeneratorConfigBuilder configBuilder = new SchemaGeneratorConfigBuilder(
                    SchemaVersion.DRAFT_2020_12, OptionPreset.PLAIN_JSON);
            SchemaGeneratorConfig config = configBuilder
                    .with(Option.EXTRA_OPEN_API_FORMAT_VALUES)
                    .without(Option.FLATTENED_ENUMS_FROM_TOSTRING)
                    .build();
            SchemaGenerator generator = new SchemaGenerator(config);
            jsonSchemaWeather = generator.generateSchema(WeatherTool.class);
            jsonSchemaTime = generator.generateSchema(TimeTool.class);
        }
    
        private static void handleGenerationResult(GenerationResult message) {
            System.out.println(JsonUtils.toJson(message));
        }
    
        public static void streamCallWithMessage(Generation gen, Message userMsg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(userMsg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        private static GenerationParam buildGenerationParam(Message userMsg) {
            FunctionDefinition fdWeather = buildFunctionDefinition(
                    "get_current_weather", "Get the weather of a specified location", jsonSchemaWeather);
            FunctionDefinition fdTime = buildFunctionDefinition(
                    "get_current_time", "Get the current time", jsonSchemaTime);
    
            return GenerationParam.builder()
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("qwen-plus-2025-04-28")
                    .enableThinking(true)
                    .messages(Arrays.asList(userMsg))
                    .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                    .incrementalOutput(true)
                    .tools(Arrays.asList(
                            ToolFunction.builder().function(fdWeather).build(),
                            ToolFunction.builder().function(fdTime).build()))
                    .build();
        }
    
        private static FunctionDefinition buildFunctionDefinition(
                String name, String description, ObjectNode schema) {
            return FunctionDefinition.builder()
                    .name(name)
                    .description(description)
                    .parameters(JsonUtils.parseString(schema.toString()).getAsJsonObject())
                    .build();
        }
    
        public static void main(String[] args) {
            try {
                Generation gen = new Generation();
                Message userMsg = Message.builder()
                        .role(Role.USER.getValue())
                        .content("Please tell me the weather in Hangzhou")
                        .build();
                streamCallWithMessage(gen, userMsg);
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }
    

    Sample response

    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":6,"total_tokens":244},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"Well, the user want to"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":12,"total_tokens":250},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"know the weather in Hangzhou. I"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":16,"total_tokens":254},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"should first check whether I have"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":22,"total_tokens":260},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"related tools. Check the provided"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":28,"total_tokens":266},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"tools, I find get_current"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":34,"total_tokens":272},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"_weather. Its parameter is location"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":38,"total_tokens":276},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":". So I should call"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":43,"total_tokens":281},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"this function, and the parameter"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":48,"total_tokens":286},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"is Hangzhou. No other"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":52,"total_tokens":290},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"tool is needed. Because"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":56,"total_tokens":294},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"the user only asks about"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":60,"total_tokens":298},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"weather. Then, construct"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":64,"total_tokens":302},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"tool_call and fill in"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":68,"total_tokens":306},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"the name and parameter"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":73,"total_tokens":311},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":". Make sure the parameter is"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":78,"total_tokens":316},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"a JSON object and location is"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":82,"total_tokens":320},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"a string. Return"}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":88,"total_tokens":326},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"after checking."}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":106,"total_tokens":344},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":"","reasoning_content":"","tool_calls":[{"type":"function","id":"call_ecc41296dccc47baa01567","function":{"name":"get_current_weather","arguments":"{\"location\": \"Hangzhou"}}]}}]}}
    {"requestId":"4edb81cd-4647-9d5d-88f9-a4f30bc6d8dd","usage":{"input_tokens":238,"output_tokens":108,"total_tokens":346},"output":{"choices":[{"finish_reason":"tool_calls","message":{"role":"assistant","content":"","reasoning_content":"","tool_calls":[{"type":"function","id":"","function":{"arguments":"\"}"}}]}}]}}

    HTTP

    Sample code

    curl

    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "input": {
            "messages": [      
                {
                    "role": "user",
                    "content": "Weather in Hangzhou"
                }
            ]
        },
        "parameters": {
            "result_format": "message",
            "enable_thinking": true,
            "incremental_output": true,
            "tools": [{
                "type": "function",
                "function": {
                    "name": "get_current_time",
                    "description": "Useful for knowing the current time.",
                    "parameters": {}
                }
            },{
                "type": "function",
                "function": {
                    "name": "get_current_weather",
                    "description": "Useful for querying the weather of a specified city.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "City or district, e.g., Beijing, Hangzhou, Yuhang District, etc."
                            }
                        },
                        "required": ["location"]
                    }
                }
            }]
        }
    }'
    

    After getting the function calling information, you can refer to Run tool functions and LLM summarizing tool function output (optional).

    Enable/disable thinking mode

    Apart from the enable_thinking parameter, Qwen3 also provides a convenient method to dynamically control the thinking mode through prompts. When enable_thinking is true, add /no_think in the prompt to turn off thinking mode in subsequent responses. To turn it on again in a multi-round conversation add /think to the latest prompt.

    In multi-round conversations, the model will follow the most recent /think or /no_think command.
    If Qwen3 does not output its thinking process, the output token will be charged at the non-thinking price.

    OpenAI

    Python

    Sample code

    from openai import OpenAI
    import os
    
    # Initialize OpenAI client
    client = OpenAI(
        # If the environment variable is not configured, please replace with Model Studio API key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    
    # Add /no_think to the prompt, which will turn off the thinking mode even if enable_thinking is set to true.
    messages = [{"role": "user", "content": "Who are you/no_think"}]
    
    completion = client.chat.completions.create(
        model="qwen-plus-2025-04-28",  # You can replace with other Qwen3 models as needed
        messages=messages,
        # The enable_thinking parameter initiates the thinking process, but it is ineffective for the QwQ model
        extra_body={"enable_thinking": True},
        stream=True,
        # stream_options={
        #     "include_usage": True
        # },
    )
    
    reasoning_content = ""  # Complete reasoning process
    answer_content = ""  # Complete response
    is_answering = False  # Indicates whether the response phase has started
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    
    for chunk in completion:
        if not chunk.choices:
            print("\nUsage:")
            print(chunk.usage)
            continue
    
        delta = chunk.choices[0].delta
    
        # Only collect reasoning content
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
    
        # Receive content and begin to respond
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content
    

    Sample response

    ====================Thinking Process====================
    
    
    ====================Complete Response====================
    
    I am Qwen, an ultra-large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I can assist you in answering questions, creating text, performing logical reasoning, coding, and other tasks. If you have any questions or need help, feel free to ask me anytime!

    Node.js

    Sample code

    import OpenAI from "openai";
    import process from 'process';
    
    // Initialize OpenAI client
    const openai = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
        baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
    });
    
    let reasoningContent = '';
    let answerContent = '';
    let isAnswering = false;
    
    async function main() {
        try {
            // Add /no_think to the prompt, which will turn off the thinking mode even if enable_thinking is set to true.
            const messages = [{ role: 'user', content: 'Who are you/no_think' }];
            const stream = await openai.chat.completions.create({
                // You can replace with other Qwen3 models as needed
                model: 'qwen-plus-2025-04-28',
                messages,
                stream: true,
                // The enable_thinking parameter initiates the Thinking Process, but it is ineffective for the QwQ model
                enable_thinking: true
            });
            console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
    
            for await (const chunk of stream) {
                if (!chunk.choices?.length) {
                    console.log('\nUsage:');
                    console.log(chunk.usage);
                    continue;
                }
    
                const delta = chunk.choices[0].delta;
    
                // Only collect reasoning content
                if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                    if (!isAnswering) {
                        process.stdout.write(delta.reasoning_content);
                    }
                    reasoningContent += delta.reasoning_content;
                }
    
                // Receive content and begin to respond
                if (delta.content !== undefined && delta.content) {
                    if (!isAnswering) {
                        console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                        isAnswering = true;
                    }
                    process.stdout.write(delta.content);
                    answerContent += delta.content;
                }
            }
        } catch (error) {
            console.error('Error:', error);
        }
    }
    
    main();

    Sample response

    ====================Thinking Process====================
    
    
    ====================Complete Response====================
    
    I am Qwen, an ultra-large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can assist with answering questions, creating text (such as stories, official documents, emails, scripts), logical reasoning, programming, and more. Additionally, I can express opinions and play games. If you have any questions or need help, feel free to ask me anytime!

    HTTP

    Sample code

    curl

    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "messages": [
            {
                "role": "user", 
                "content": "Who are you /no_think"
            }
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        },
        "enable_thinking": true
    }'

    Sample response

    data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"finish_reason":null,"delta":{"content":"I"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" am Qwen,","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" a large-scale language","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" model independently developed by","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" the Tongyi Lab","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" under Alibaba Group.","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" I am capable of","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" answering questions, creating","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" text such as stories","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":", official documents,","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" emails, scripts,","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" performing logical reasoning,","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" coding, and more","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":". I can also","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" express opinions and play","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" games. If you","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" have any questions or","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" need assistance, feel","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" free to let me","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"delta":{"content":" know anytime!","reasoning_content":null},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":15,"completion_tokens":80,"total_tokens":95,"completion_tokens_details":{"reasoning_tokens":0}},"created":1746689786,"system_fingerprint":null,"model":"qwen-plus-2025-04-28","id":"chatcmpl-284e4638-e77b-9663-84f5-c46778baa018"}
    
    data: [DONE]

    DashScope

    Python

    Sample code

    import os
    from dashscope import Generation
    import dashscope
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
    
    # Add the /no_think suffix to the prompt, which will turn off the thinking mode even if enable_thinking is set to true.
    messages = [{"role": "user", "content": "Who are you? /no_think"}]
    
    completion = Generation.call(
        # If the environment variable is not configured, please replace the following line with your Model Studio API key: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # You can replace with other Qwen3 models as needed
        model="qwen-plus-2025-04-28",
        messages=messages,
        result_format="message",
        enable_thinking=True,
        stream=True,
        incremental_output=True, 
    )
    
    # Define complete Thinking Process
    reasoning_content = ""
    # Define complete response
    answer_content = ""
    # Determine whether the Thinking Process has ended and the response has begun
    is_answering = False
    
    print("=" * 20 + "Thinking Process" + "=" * 20)
    
    for chunk in completion:
        # Ignore if both Thinking Process and response are empty
        if (
            chunk.output.choices[0].message.content == ""
            and chunk.output.choices[0].message.reasoning_content == ""
        ):
            pass
        else:
            # If currently in Thinking Process
            if (
                chunk.output.choices[0].message.reasoning_content != ""
                and chunk.output.choices[0].message.content == ""
            ):
                print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
                reasoning_content += chunk.output.choices[0].message.reasoning_content
            # If currently in response
            elif chunk.output.choices[0].message.content != "":
                if not is_answering:
                    print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
                    is_answering = True
                print(chunk.output.choices[0].message.content, end="", flush=True)
                answer_content += chunk.output.choices[0].message.content
    
    # If you need to print the complete thinking process and complete response, uncomment the following lines and run
    # print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
    # print(f"{reasoning_content}")
    # print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
    # print(f"{answer_content}")
    

    Sample response

    ====================Thinking Process====================
    
    ====================Complete Response====================
    Hello! I'm Qwen, and I'm really excited to tell you about myself! Think of me as your friendly AI companion, always ready to learn and help out. Whether you need help with coding, want to dive into some creative writing, or just have questions about any topic under the sun, I'm here to explore it all with you. 
    
    I love tackling challenges - from solving complex math problems to having deep conversations about philosophy. And don't get me started on my creative side! I can help you craft stories, poems, or any written content you can imagine. What makes me special is how I can switch between different modes to best suit our conversation - kind of like a Swiss Army knife for your curiosity!
    
    Want to have a casual chat or dive deep into some serious learning? I'm equally comfortable with both! Let's embark on this journey of discovery together - what would you like to explore first?

    Java

    Sample code

    // Version of dashscope SDK >= 2.19.4
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.alibaba.dashscope.utils.Constants;
    import io.reactivex.Flowable;
    import java.lang.System;
    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class Main {
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static StringBuilder reasoningContent = new StringBuilder();
        private static StringBuilder finalContent = new StringBuilder();
        private static boolean isFirstPrint = true;
    
        private static void handleGenerationResult(GenerationResult message) {
            String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
            String content = message.getOutput().getChoices().get(0).getMessage().getContent();
    
            if (!reasoning.isEmpty()) {
                reasoningContent.append(reasoning);
                if (isFirstPrint) {
                    System.out.println("====================Thinking Process====================");
                    isFirstPrint = false;
                }
                System.out.print(reasoning);
            }
    
            if (!content.isEmpty()) {
                finalContent.append(content);
                if (!isFirstPrint) {
                    System.out.println("\n====================Complete Response====================");
                    isFirstPrint = true;
                }
                System.out.print(content);
            }
        }
    
        private static GenerationParam buildGenerationParam(Message userMsg) {
            return GenerationParam.builder()
                    // If the environment variable is not configured, please replace the following line with your Bailian API Key: .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    // This uses the qwen-plus-2025-04-28 model; you can replace it with other Qwen3 models as needed
                    .model("qwen-plus-2025-04-28")
                    .enableThinking(true)
                    .incrementalOutput(true)
                    .resultFormat("message")
                    .messages(Arrays.asList(userMsg))
                    .build();
        }
    
        public static void streamCallWithMessage(Generation gen, Message userMsg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(userMsg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        public static void main(String[] args) {
            try {
                Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
                // Add /no_think to the prompt, which will turn off the thinking mode even if enable_thinking is set to true.
                Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?/no_think").build();
                streamCallWithMessage(gen, userMsg);
    //             Print final result
    //            if (reasoningContent.length() > 0) {
    //                System.out.println("\n====================Complete Response====================");
    //                System.out.println(finalContent.toString());
    //            }
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }
    

    Sample response

    I am Qwen, an ultra-large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you answer questions, create texts (such as stories, official documents, emails, scripts), perform logical reasoning, programming, and more. Additionally, I can express opinions and play games. If you have any questions or need assistance, feel free to ask me anytime!

    HTTP

    Sample code

    curl

    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus-2025-04-28",
        "input":{
            "messages":[      
                {
                    "role": "user",
                    "content": "Who are you /no_think"
                }
            ]
        },
        "parameters":{
            "enable_thinking": true,
            "incremental_output": true,
            "result_format": "message"
        }
    }'

    Sample response

    id:1
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"I","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":20,"output_tokens":5,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:2
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" am a large-scale","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":24,"output_tokens":9,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:3
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" language model independently developed","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":28,"output_tokens":13,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:4
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" by the Tongyi","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":32,"output_tokens":17,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:5
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" Lab under Alibaba Group","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":36,"output_tokens":21,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:6
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":". My name is","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":40,"output_tokens":25,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:7
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" Qwen. I","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":44,"output_tokens":29,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:8
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" am capable of answering","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":48,"output_tokens":33,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:9
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" questions, creating text","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":52,"output_tokens":37,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:10
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" such as stories,","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":56,"output_tokens":41,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:11
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" official documents, emails","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":60,"output_tokens":45,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:12
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":", scripts, performing","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":64,"output_tokens":49,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:13
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" logical reasoning, coding","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":68,"output_tokens":53,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:14
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":", and more.","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":72,"output_tokens":57,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:15
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" I can also express","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":76,"output_tokens":61,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:16
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" opinions and play games","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":80,"output_tokens":65,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:17
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":". If you have","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":84,"output_tokens":69,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:18
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" any questions or need","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":88,"output_tokens":73,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:19
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" assistance, feel free","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":92,"output_tokens":77,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:20
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":" to ask me anytime","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":96,"output_tokens":81,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:21
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":97,"output_tokens":82,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}
    
    id:22
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":97,"output_tokens":82,"input_tokens":15,"output_tokens_details":{"reasoning_tokens":0}},"request_id":"a3e7bd75-db44-9356-96fc-c69b5aa97b80"}

    Usage notes

    To achieve best reasoning performance, do not set System Message. You can specify the purpose, output format, and other requirements in User Message.

    FAQ

    Q: How to disable the thinking process?

    It depends on the model you are using:

    • Qwen3:

      The model supports both the thinking and non-thinking modes. Set enable_thinking to false to disable thinking.

    • QwQ

      Thinking process cannot be disabled.

    Q: How to purchase tokens when my free quota runs out?

    Go to Expenses and Costs to recharge. Make sure your account has no overdue payments.

    After you exceed the free quota, fees are automatically deducted from your account. The bills are generated one hour after usage. View consumption details in Bill Details.

    Q: Can I upload images or documents in questions?

    Qwen3 and QwQ only support text input, and do not support images or documents. QVQ supports deep thinking based on images.

    Q: How to view token usage and the number of API calls?

    One hour after the model is called, go to Model Observation. Set the conditions (for example, select a time range or workspace). Then, in the Models section, find the target model and click Monitor to view the call statistics of the model. For more information, see Model observation.

    Data is updated hourly. During peak periods, there may be hour-level latency.

    image

    API references

    For the input and output parameters, see Qwen.

    Error codes

    If the call failed and an error message is returned, see Error messages.

    Thank you! We've received your feedback.