Document Center
 
All Products
Search
  • Document Center
  • Alibaba Cloud Model Studio
  • User Guide (Models)
  • Inference
  • Text generation
  • Deep thinking

This Product

  • This Product
  • All Products

    Alibaba Cloud Model Studio:Deep thinking

    Document Center

    Alibaba Cloud Model Studio:Deep thinking

    Last Updated:Feb 28, 2026

    Deep thinking models reason before responding to improve accuracy for complex tasks such as logical reasoning and numerical calculations. This topic describes how to call deep thinking models such as Qwen and DeepSeek.

    QwQ Logo
    Qwen

    Usage

    Model Studio provides APIs for various deep thinking models. These APIs support two modes: hybrid thinking and thinking-only.

    • Hybrid thinking mode: Use the enable_thinking parameter to control whether to enable thinking mode:

      • true: The model responds after thinking.

      • false: The model responds directly.

      OpenAI compatible

      # Import dependencies and create a client...
      completion = client.chat.completions.create(
          model="qwen-plus", # Select a model
          messages=[{"role": "user", "content": "Who are you"}],    
          # Because enable_thinking is not a standard OpenAI parameter, pass it through extra_body.
          extra_body={"enable_thinking":True},
          # Call in streaming output mode.
          stream=True,
          # Make the last packet of the streaming response include token consumption information.
          stream_options={
              "include_usage": True
          }
      )

      DashScope

      The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a url error. For the correct invocation method, see Enable or disable thinking mode.
      # Import dependencies...
      
      response = Generation.call(
          # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx",
          api_key=os.getenv("DASHSCOPE_API_KEY"),
          # You can replace this with other deep thinking models as needed.
          model="qwen-plus",
          messages=messages,
          result_format="message",
          enable_thinking=True,
          stream=True,
          incremental_output=True
      )
    • Thinking-only mode: The model always thinks before responding, and this feature cannot be disabled. The request format is the same as hybrid thinking mode, except you do not need to set the enable_thinking parameter.

    The reasoning content is returned in the reasoning_content field, and the response content is returned in the content field. Because deep thinking models must reason before responding, response latency increases. Most of these models support only streaming output. Therefore, this topic uses streaming calls as examples.

    Supported models

    Qwen3.5

    • Commercial edition

      • Qwen3.5 Plus series (hybrid thinking mode, enabled by default): qwen3.5-plus, qwen3.5-plus-2026-02-15

      • Qwen3.5 Flash series (hybrid thinking mode, enabled by default): qwen3.5-flash, qwen3.5-flash-2026-02-23

    • Open source edition

      • Hybrid thinking mode, enabled by default: qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b

    Qwen3

    • Commercial edition

      • Qwen Max series (hybrid thinking mode, disabled by default): qwen3-max-2026-01-23, qwen3-max-preview

      • Qwen Plus series (hybrid thinking mode, disabled by default): qwen-plus, qwen-plus-latest, qwen-plus-2025-04-28 and later snapshots

      • Qwen Flash series (hybrid thinking mode, disabled by default): qwen-flash, qwen-flash-2025-07-28 and later snapshots

      • Qwen Turbo series (hybrid thinking mode, disabled by default): qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28 and later snapshots

    • Open source edition

      • Hybrid thinking mode, enabled by default: qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b

      • Thinking-only mode: qwen3-next-80b-a3b-thinking, qwen3-235b-a22b-thinking-2507, qwen3-30b-a3b-thinking-2507

    QwQ (based on Qwen2.5)

    Thinking-only mode: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05, qwq-32b

    DeepSeek (Beijing)

    • Hybrid thinking mode, disabled by default: deepseek-v3.2, deepseek-v3.2-exp, deepseek-v3.1

    • Thinking-only mode: deepseek-r1, deepseek-r1-0528, deepseek-r1 distilled model

    GLM (Beijing)

    Hybrid thinking mode, enabled by default: glm-5, glm-4.7, glm-4.6

    Kimi (Beijing)

    Thinking-only mode: kimi-k2-thinking

    For information such as model names, context window, pricing, and snapshot versions, see Model list. For information about rate limits, see Rate limiting.

    Getting started

    Prerequisites: You have obtained an API key and configured it as an environment variable. If you make calls using an SDK, install the OpenAI or DashScope SDK. The DashScope Java SDK version must be 2.19.4 or later.

    Run the following code to call qwen-plus in thinking mode with streaming output.

    OpenAI compatible

    Python

    Sample code

    from openai import OpenAI
    import os
    
    # Initialize the OpenAI client
    client = OpenAI(
        # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following is the base_url for the Singapore region. If you use a model in the Virginia region, replace the base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1
        # If you use a model in the Beijing region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    
    messages = [{"role": "user", "content": "Who are you"}]
    
    completion = client.chat.completions.create(
        model="qwen-plus",  # You can replace this with other deep thinking models as needed.
        messages=messages,
        extra_body={"enable_thinking": True},
        stream=True,
        stream_options={
            "include_usage": True
        },
    )
    
    reasoning_content = ""  # Full thinking process
    answer_content = ""  # Full response
    is_answering = False  # Whether the response phase has started
    print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
    
    for chunk in completion:
        if not chunk.choices:
            print("\nUsage:")
            print(chunk.usage)
            continue
    
        delta = chunk.choices[0].delta
    
        # Collect only the reasoning content
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
    
        # When content is received, start responding
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content
    

    Response

    ====================Thinking process====================
    
    Okay, the user is asking "Who are you". I need to provide an accurate and friendly answer. First, I must confirm my identity, which is Qwen, developed by the Tongyi Lab under Alibaba Group. Next, I should explain my main functions, such as answering questions, creating text, and logical reasoning. I should also maintain a friendly tone and avoid being too technical to make the user feel at ease. I must also be careful not to use complex terminology and ensure the answer is concise and clear. Additionally, I might need to add some interactive elements, inviting the user to ask questions to encourage further communication. Finally, I will check if I have missed any important information, such as my Chinese name "Qwen" and English name "Qwen", along with my parent company and lab. I need to ensure the answer is comprehensive and meets the user's expectations.
    ====================Full response====================
    
    Hello! I am Qwen, an ultra-large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I can answer questions, create text, perform logical reasoning, write code, and more, with the goal of providing users with high-quality information and services. You can call me Qwen. How can I help you?

    Node.js

    Sample code

    import OpenAI from "openai";
    import process from 'process';
    
    // Initialize the OpenAI client
    const openai = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variables
        // The following is the base_url for the Singapore region. If you use a model in the Virginia region, replace the base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1 
        // If you use a model in the Beijing region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
    });
    
    let reasoningContent = '';
    let answerContent = '';
    let isAnswering = false;
    
    async function main() {
        try {
            const messages = [{ role: 'user', content: 'Who are you' }];
            const stream = await openai.chat.completions.create({
                model: 'qwen-plus',
                messages,
                stream: true,
                enable_thinking: true
            });
            console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');
    
            for await (const chunk of stream) {
                if (!chunk.choices?.length) {
                    console.log('\nUsage:');
                    console.log(chunk.usage);
                    continue;
                }
    
                const delta = chunk.choices[0].delta;
                
                // Collect only the reasoning content
                if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                    if (!isAnswering) {
                        process.stdout.write(delta.reasoning_content);
                    }
                    reasoningContent += delta.reasoning_content;
                }
    
                // When content is received, start responding
                if (delta.content !== undefined && delta.content) {
                    if (!isAnswering) {
                        console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                        isAnswering = true;
                    }
                    process.stdout.write(delta.content);
                    answerContent += delta.content;
                }
            }
        } catch (error) {
            console.error('Error:', error);
        }
    }
    
    main();

    Response

    ====================Thinking process====================
    
    Okay, the user is asking "Who are you". I need to respond with my identity. First, I should clearly state that I am Qwen, an ultra-large-scale language model developed by Alibaba Cloud. Next, I can mention my main functions, such as answering questions, creating text, and logical reasoning. I should also emphasize my multilingual support, including Chinese and English, so the user knows I can handle requests in different languages. Additionally, I might need to explain my application scenarios, such as helping with study, work, and daily life. However, the user's question is quite direct, so I probably don't need to provide too much detail. I should keep it concise and clear. At the same time, I must ensure a friendly tone and invite the user to ask further questions. I will check for any missed important information, such as my version or latest updates, but the user probably doesn't need that level of detail. Finally, I will confirm that the answer is accurate and free of errors.
    ====================Full response====================
    
    I am Qwen, an ultra-large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I am capable of various tasks, including answering questions, creating text, logical reasoning, and coding. I support multiple languages, including Chinese and English. If you have any questions or need help, feel free to let me know!

    HTTP

    Sample code

    curl

    # ======= Important =======
    # The following is the base_url for Singapore. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
    # If you use a model in the Virginia region, replace the base_url with: https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
    # === Delete this comment before execution ===
    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus",
        "messages": [
            {
                "role": "user", 
                "content": "Who are you"
            }
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        },
        "enable_thinking": true
    }'

    Response

    data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    .....
    
    data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: [DONE]

    DashScope

    The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a url error. For the correct invocation method, see Enable or disable thinking mode.

    Python

    Sample code

    import os
    from dashscope import Generation
    import dashscope
    
    # Base URL for the Singapore region. For the US (Virginia) region, use https://dashscope-us.aliyuncs.com/api/v1.
    # For the China (Beijing) region, use https://dashscope.aliyuncs.com/api/v1.
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"
    
    messages = [{"role": "user", "content": "Who are you?"}]
    
    completion = Generation.call(
        # Replace the following line with api_key = "sk-xxx" if you do not configure the environment variable.
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        model="qwen-plus",
        messages=messages,
        result_format="message",
        enable_thinking=True,
        stream=True,
        incremental_output=True,
    )
    
    # Store the full reasoning content.
    reasoning_content = ""
    # Store the full response.
    answer_content = ""
    # Track whether reasoning has ended and the response has started.
    is_answering = False
    
    print("=" * 20 + "Reasoning process" + "=" * 20)
    
    for chunk in completion:
        # Skip empty chunks.
        if (
            chunk.output.choices[0].message.content == ""
            and chunk.output.choices[0].message.reasoning_content == ""
        ):
            pass
        else:
            # Print reasoning content.
            if (
                chunk.output.choices[0].message.reasoning_content != ""
                and chunk.output.choices[0].message.content == ""
            ):
                print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
                reasoning_content += chunk.output.choices[0].message.reasoning_content
            # Print response content.
            elif chunk.output.choices[0].message.content != "":
                if not is_answering:
                    print("\n" + "=" * 20 + "Full response" + "=" * 20)
                    is_answering = True
                print(chunk.output.choices[0].message.content, end="", flush=True)
                answer_content += chunk.output.choices[0].message.content
    
    # Uncomment the following lines to print the full reasoning content and full response.
    # print("=" * 20 + "Full reasoning content" + "=" * 20 + "\n")
    # print(f"{reasoning_content}")
    # print("=" * 20 + "Full response" + "=" * 20 + "\n")
    # print(f"{answer_content}")
    
    

    Response

    ====================Reasoning process====================
    The user asks: “Who are you?” I need to answer this question. First, I identify myself as Qwen, a large-scale language model developed by Alibaba Cloud. Next, I describe my capabilities, such as answering questions, generating text, logical reasoning, and programming. My goal is to support users as a helpful assistant.
    
    I keep my tone conversational and avoid technical terms or complex sentences. I add friendly phrases like “Hello!” to make the interaction natural. I ensure accuracy and cover key points, including my developer, main functions, and use cases.
    
    I also anticipate follow-up questions, such as examples or technical details. So I hint at broader support—for example, by saying “I can help with everyday questions or professional topics.” This keeps the response open and inviting.
    
    Finally, I check for flow, repetition, or redundancy. I keep it concise, friendly, and professional.
    ====================Full response====================
    Hello! I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, generate text—such as stories, official documents, emails, scripts—and perform logical reasoning and programming. I aim to support and assist you. Whether your question is about daily life or a professional topic, I will do my best to help. Is there anything I can assist you with?

    Java

    Sample code

    // dashscope SDK version >= 2.19.4
    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import io.reactivex.Flowable;
    import java.lang.System;
    import com.alibaba.dashscope.utils.Constants;
    
    public class Main {
        static {
            // Base URL for the Singapore region. For the US (Virginia) region, use https://dashscope-us.aliyuncs.com/api/v1.
            // For the China (Beijing) region, use https://dashscope.aliyuncs.com/api/v1.
            Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
        }
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static StringBuilder reasoningContent = new StringBuilder();
        private static StringBuilder finalContent = new StringBuilder();
        private static boolean isFirstPrint = true;
    
        private static void handleGenerationResult(GenerationResult message) {
            String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
            String content = message.getOutput().getChoices().get(0).getMessage().getContent();
    
            if (!reasoning.isEmpty()) {
                reasoningContent.append(reasoning);
                if (isFirstPrint) {
                    System.out.println("====================Reasoning process====================");
                    isFirstPrint = false;
                }
                System.out.print(reasoning);
            }
    
            if (!content.isEmpty()) {
                finalContent.append(content);
                if (!isFirstPrint) {
                    System.out.println("\n====================Full response====================");
                    isFirstPrint = true;
                }
                System.out.print(content);
            }
        }
        private static GenerationParam buildGenerationParam(Message userMsg) {
            return GenerationParam.builder()
                    // Replace the following line with .apiKey("sk-xxx") if you do not configure the environment variable.
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("qwen-plus")
                    .enableThinking(true)
                    .incrementalOutput(true)
                    .resultFormat("message")
                    .messages(Arrays.asList(userMsg))
                    .build();
        }
        public static void streamCallWithMessage(Generation gen, Message userMsg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(userMsg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        public static void main(String[] args) {
            try {
                Generation gen = new Generation();
                Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
                streamCallWithMessage(gen, userMsg);
    //             Print the final result.
    //            if (reasoningContent.length() > 0) {
    //                System.out.println("\n====================Full response====================");
    //                System.out.println(finalContent.toString());
    //            }
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }

    Response

    ====================Reasoning process====================
    The user asks, “Who are you?” I must answer based on my identity. I am Qwen, a large-scale language model from Alibaba Group. I keep my reply conversational and simple.
    
    The user may be new to me or confirming my identity. I start by stating who I am, then briefly list my abilities—answering questions, writing stories, drafting documents, coding, and more. I mention multilingual support so users know I handle multiple languages.
    
    To sound human, I use a friendly tone and maybe an emoji. I also invite further questions or tasks, like asking how I can help.
    
    I avoid jargon and long sentences. I double-check for missing points, like multilingual support and core skills. I ensure the reply is clear, friendly, and professional.
    ====================Full response====================
    Hello! I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, write stories, draft official documents, compose emails, create scripts, perform logical reasoning, code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. How can I help you?

    HTTP

    Sample code

    curl

    # ======= Important notice =======
    # The following URL is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
    # For the US (Virginia) region, use: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
    # === Remove this comment before running ===
    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus",
        "input":{
            "messages":[      
                {
                    "role": "user",
                    "content": "Who are you?"
                }
            ]
        },
        "parameters":{
            "enable_thinking": true,
            "incremental_output": true,
            "result_format": "message"
        }
    }'

    Response

    id:1
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Okay","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:2
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:3
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:4
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:5
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"\"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    ......
    
    id:358
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"Help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:359
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:360
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"Feel free","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:361
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"to","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:362
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"let me know","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:363
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:364
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

    Core capabilities

    Switch between thinking and non-thinking modes

    Enabling thinking mode usually improves response quality but increases latency and cost. When using models that support hybrid thinking mode, dynamically switch between thinking and non-thinking modes based on question complexity without changing models:

    • For tasks that do not require complex reasoning (such as casual chat or simple Q&A), set enable_thinking to false to disable thinking mode.

    • For tasks that require complex reasoning (such as logical reasoning, code generation, or math problem solving), set enable_thinking to true to enable thinking mode.

    OpenAI compatibility

    Important

    enable_thinking is not an OpenAI standard parameter. If you use the OpenAI Python SDK, pass it through extra_body. In the Node.js SDK, pass it as a top-level parameter.

    Python

    Example code

    from openai import OpenAI
    import os
    
    # Initialize the OpenAI client
    client = OpenAI(
        # If you haven't configured an environment variable, replace this with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # This is the base_url for the Singapore region. If you use a model in the Virginia region, replace base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1. If you use a model in the Beijing region, replace base_url with https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    
    messages = [{"role": "user", "content": "Who are you?"}]
    completion = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        # Use extra_body to set enable_thinking and enable the reasoning process
        extra_body={"enable_thinking": True},
        stream=True,
        stream_options={
            "include_usage": True
        },
    )
    
    reasoning_content = ""  # Full reasoning process
    answer_content = ""  # Full response
    is_answering = False  # Whether the response phase has started
    print("\n" + "=" * 20 + "Reasoning process" + "=" * 20 + "\n")
    
    for chunk in completion:
        if not chunk.choices:
            print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
            print(chunk.usage)
            continue
    
        delta = chunk.choices[0].delta
    
        # Collect only reasoning content
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
    
        # Received content; start responding
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content
    

    Response

    ====================Reasoning process====================
    
    Hmm, the user asked "Who are you?" I need to figure out what they really want to know. They might be encountering me for the first time or verifying my identity. I should start by introducing myself as Qwen, developed by Tongyi Lab. Then explain my capabilities—answering questions, generating text, coding, etc.—so users understand how I can help. Mentioning multilingual support shows international users they can interact in their preferred language. End with a friendly invitation to ask more questions to encourage further interaction. Keep it concise and avoid excessive technical jargon so it's easy to understand. The user likely wants a quick overview of my abilities, so focus on features and use cases. Also check if any key details are missing, like mentioning Alibaba Group or deeper technical specs. But basic info is probably sufficient here. Ensure the tone stays friendly and professional while inviting follow-up questions.
    ====================Full response====================
    
    I am Qwen, a large-scale language model developed by Tongyi Lab. I can help you answer questions, create text, write code, express opinions, and more—all in multiple languages. Is there anything I can assist you with?
    ====================Token usage====================
    
    CompletionUsage(completion_tokens=221, prompt_tokens=10, total_tokens=231, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=172, rejected_prediction_tokens=None), prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0))

    Node.js

    Example code

    import OpenAI from "openai";
    import process from 'process';
    
    // Initialize the OpenAI client
    const openai = new OpenAI({
        // If you haven't configured an environment variable, replace this with your Alibaba Cloud Model Studio API key: apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY, 
        // This is the base_url for the Singapore region. If you use a model in the Virginia region, replace baseURL with https://dashscope-us.aliyuncs.com/compatible-mode/v1. If you use a model in the Beijing region, replace baseURL with https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
    });
    
    let reasoningContent = ''; // Full reasoning process
    let answerContent = ''; // Full response
    let isAnswering = false; // Whether the response phase has started
    
    async function main() {
        try {
            const messages = [{ role: 'user', content: 'Who are you?' }];
            
            const stream = await openai.chat.completions.create({
                model: 'qwen-plus',
                messages,
                // In the Node.js SDK, non-standard parameters like enable_thinking are passed as top-level properties, not inside extra_body
                enable_thinking: true,
                stream: true,
                stream_options: {
                    include_usage: true
                },
            });
    
            console.log('\n' + '='.repeat(20) + 'Reasoning process' + '='.repeat(20) + '\n');
    
            for await (const chunk of stream) {
                if (!chunk.choices?.length) {
                    console.log('\n' + '='.repeat(20) + 'Token usage' + '='.repeat(20) + '\n');
                    console.log(chunk.usage);
                    continue;
                }
    
                const delta = chunk.choices[0].delta;
                
                // Collect only reasoning content
                if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                    if (!isAnswering) {
                        process.stdout.write(delta.reasoning_content);
                    }
                    reasoningContent += delta.reasoning_content;
                }
    
                // Received content; start responding
                if (delta.content !== undefined && delta.content) {
                    if (!isAnswering) {
                        console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                        isAnswering = true;
                    }
                    process.stdout.write(delta.content);
                    answerContent += delta.content;
                }
            }
        } catch (error) {
            console.error('Error:', error);
        }
    }
    
    main();

    Response

    ====================Reasoning process====================
    
    Hmm, the user asked "Who are you?" I need to determine what they're looking for. They might be new to me or confirming my identity. Start by introducing my name—Qwen—and mention I'm a large-scale language model independently developed by Tongyi Lab under Alibaba Group. Next, highlight my capabilities: answering questions, creating text (like stories, official documents, emails, scripts), logical reasoning, coding, expressing opinions, and even playing games. Emphasize multilingual support—including Chinese, English, German, French, Spanish, and more—so international users feel included. End with an open, friendly invitation to ask questions. Keep language simple and conversational, avoiding complex sentences or jargon. The user might be testing my abilities or seeking specific help, but for a first reply, stick to core info and guidance. Stay approachable to encourage further interaction.
    ====================Full response====================
    
    Hello! I'm Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you answer questions, create text (like stories, official documents, emails, scripts), perform logical reasoning, write code, express opinions, and even play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish.
    
    If you have any questions or need assistance, just let me know!
    ====================Token usage====================
    
    {
      prompt_tokens: 10,
      completion_tokens: 288,
      total_tokens: 298,
      completion_tokens_details: { reasoning_tokens: 188 },
      prompt_tokens_details: { cached_tokens: 0 }
    }

    HTTP

    Example code

    curl

    # ======= Important notes =======
    # This is the base_url for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
    # If you use a model in the Virginia region, replace the URL with: https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
    # === Delete this comment before execution ===
    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus",
        "messages": [
            {
                "role": "user", 
                "content": "Who are you?"
            }
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        },
        "enable_thinking": true
    }'

    DashScope

    The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a url error. For the correct invocation method, see Enable or disable thinking mode.

    Python

    Example code

    import os
    from dashscope import Generation
    import dashscope
    # This is the base_url for the Singapore region. If you use a model in the Virginia region, replace base_url with https://dashscope-us.aliyuncs.com/api/v1
    # If you use a model in the Beijing region, replace base_url with https://dashscope.aliyuncs.com/api/v1
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
    
    # Initialize request parameters
    messages = [{"role": "user", "content": "Who are you?"}]
    
    completion = Generation.call(
        # If you haven't configured an environment variable, replace this with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        model="qwen-plus",
        messages=messages,
        result_format="message",  # Set result format to message
        enable_thinking=True,     # Enable reasoning process
        stream=True,              # Enable streaming output
        incremental_output=True,  # Enable incremental output
    )
    
    reasoning_content = ""  # Full reasoning process
    answer_content = ""     # Full response
    is_answering = False    # Whether the response phase has started
    
    print("\n" + "=" * 20 + "Reasoning process" + "=" * 20 + "\n")
    
    for chunk in completion:
        message = chunk.output.choices[0].message
        
        # Collect only reasoning content
        if message.reasoning_content:
            if not is_answering:
                print(message.reasoning_content, end="", flush=True)
            reasoning_content += message.reasoning_content
    
        # Received content; start responding
        if message.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
                is_answering = True
            print(message.content, end="", flush=True)
            answer_content += message.content
    
    print("\n" + "=" * 20 + "Token usage" + "=" * 20 + "\n")
    print(chunk.usage)
    # After the loop, reasoning_content and answer_content contain the complete content
    # You can add further processing here as needed
    # print(f"\n\nFull reasoning process:\n{reasoning_content}")
    # print(f"\nFull response:\n{answer_content}")
    

    Response

    ====================Reasoning process====================
    
    Hmm, the user asked "Who are you?" I need to determine what they're looking for. They might be encountering me for the first time or verifying my identity. First, introduce my name—Qwen—and state I'm a large-scale language model developed by Tongyi Lab. Next, explain my capabilities: answering questions, creating text, coding, etc., so users understand my utility. Mention multilingual support to show international users they can interact in their preferred language. End with a friendly invitation to ask questions to encourage further interaction. Use clear, simple language and avoid excessive technical terms. The user might have deeper needs—like testing my abilities or seeking specific help—so providing concrete examples (writing stories, official documents, emails, etc.) helps. Ensure the response flows naturally without bullet points. Also clarify I'm an AI assistant without personal consciousness, basing all answers on training data to prevent misunderstandings. Check for missing key info like multimodal capabilities or recent updates, but keep it concise. Overall, aim for a helpful, friendly, and supportive reply that makes users feel understood.
    ====================Full response====================
    
    I am Qwen, a large-scale language model independently developed by Tongyi Lab under Alibaba Group. I can help you:
    
    1. **Answer questions**: Whether academic, general knowledge, or domain-specific, I'll do my best to assist.
    2. **Create text**: Write stories, official documents, emails, scripts—I can handle them all.
    3. **Logical reasoning**: I can help solve problems through logical analysis.
    4. **Programming**: I understand and generate code in multiple programming languages.
    5. **Multilingual support**: I support many languages, including but not limited to Chinese, English, German, French, and Spanish.
    
    If you have any questions or need help, just let me know!
    ====================Token usage====================
    
    {"input_tokens": 11, "output_tokens": 405, "total_tokens": 416, "output_tokens_details": {"reasoning_tokens": 256}, "prompt_tokens_details": {"cached_tokens": 0}}

    Java

    Example code

    // DashScope SDK version >= 2.19.4
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.alibaba.dashscope.utils.Constants;
    import io.reactivex.Flowable;
    import java.lang.System;
    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class Main {
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static StringBuilder reasoningContent = new StringBuilder();
        private static StringBuilder finalContent = new StringBuilder();
        private static boolean isFirstPrint = true;
    
        private static void handleGenerationResult(GenerationResult message) {
            String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
            String content = message.getOutput().getChoices().get(0).getMessage().getContent();
    
            if (!reasoning.isEmpty()) {
                reasoningContent.append(reasoning);
                if (isFirstPrint) {
                    System.out.println("====================Reasoning process====================");
                    isFirstPrint = false;
                }
                System.out.print(reasoning);
            }
    
            if (!content.isEmpty()) {
                finalContent.append(content);
                if (!isFirstPrint) {
                    System.out.println("\n====================Full response====================");
                    isFirstPrint = true;
                }
                System.out.print(content);
            }
        }
        private static GenerationParam buildGenerationParam(Message userMsg) {
            return GenerationParam.builder()
                    // If you haven't configured an environment variable, replace the next line with: .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("qwen-plus")
                    .enableThinking(true)
                    .incrementalOutput(true)
                    .resultFormat("message")
                    .messages(Arrays.asList(userMsg))
                    .build();
        }
        public static void streamCallWithMessage(Generation gen, Message userMsg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(userMsg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        public static void main(String[] args) {
            try {
                // This is the base_url for the Singapore region. If you use a model in the Virginia region, replace base_url with https://dashscope-us.aliyuncs.com/api/v1
                // If you use a model in the Beijing region, replace base_url with https://dashscope.aliyuncs.com/api/v1
                Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
                Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
                streamCallWithMessage(gen, userMsg);
    //             Print final results
    //            if (reasoningContent.length() > 0) {
    //                System.out.println("\n====================Full response====================");
    //                System.out.println(finalContent.toString());
    //            }
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }

    Response

    ====================Reasoning process====================
    Hmm, the user asked "Who are you?" I need to figure out what they want to know. They might be curious about my identity or testing my response. Start by clearly stating I'm Qwen, a large-scale language model under Alibaba Group. Briefly outline my capabilities—answering questions, creating text, coding—to show my utility. Mention multilingual support so international users know they can interact in their preferred language. End with a friendly invitation to ask questions to make them feel welcome. Keep the response concise but informative. The user might have follow-up questions about technical details or use cases, but the initial reply should stay simple and clear. Avoid jargon so all users can understand. Double-check for key omissions like multilingual support and specific feature examples. This should cover their needs.
    ====================Full response====================
    I am Qwen, a large-scale language model under Alibaba Group. I can answer questions, create text (such as stories, official documents, emails, scripts), perform logical reasoning, write code, express opinions, play games, and support multilingual communication—including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need assistance, feel free to ask!

    HTTP

    Example code

    curl

    # ======= Important notes =======
    # This is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
    # If you use a model in the Virginia region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
    # === Delete this comment before execution ===
    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus",
        "input":{
            "messages":[      
                {
                    "role": "user",
                    "content": "Who are you?"
                }
            ]
        },
        "parameters":{
            "enable_thinking": true,
            "incremental_output": true,
            "result_format": "message"
        }
    }'

    Response

    id:1
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:2
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:3
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:4
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:5
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"\"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    ......
    
    id:358
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:359
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:360
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"Welcome","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:361
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"at any time","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:362
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"tell me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:363
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
    
    id:364
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

    Additionally, the open-source Qwen3 hybrid-thinking models and qwen-plus-2025-04-28 and qwen-turbo-2025-04-28 support dynamically controlling thinking mode through prompts. When enable_thinking is set to true, adding /no_think to the prompt disables thinking mode. To re-enable thinking mode in a multi-turn conversation, add /think to the latest input prompt. The model follows the most recent /think or /no_think instruction.

    Limit thinking process length

    Deep thinking models can sometimes generate long inference processes. This increases wait times and consumes more tokens. Use the thinking_budget parameter to limit the maximum number of tokens for the inference process. If the limit is exceeded, the model immediately generates a response.

    thinking_budget is the model’s maximum chain-of-thought length. For more information, see Model list.
    Important

    The thinking_budget parameter is supported by Qwen3 (thinking mode) and Kimi.

    OpenAI compatible

    Python

    Sample code

    from openai import OpenAI
    import os
    
    # Initialize the OpenAI client.
    client = OpenAI(
        # If the environment variable is not configured, replace the value with your Model Studio API key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/compatible-mode/v1. # If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/compatible-mode/v1.
        # If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/compatible-mode/v1.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    
    messages = [{"role": "user", "content": "Who are you"}]
    
    completion = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        # The enable_thinking parameter enables the thinking process. The thinking_budget parameter sets the maximum number of tokens for the inference process.
        extra_body={
            "enable_thinking": True,
            "thinking_budget": 50
            },
        stream=True,
        stream_options={
            "include_usage": True
        },
    )
    
    reasoning_content = ""  # Complete thinking process
    answer_content = ""  # Complete response
    is_answering = False  # Indicates whether the response phase has started
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    
    for chunk in completion:
        if not chunk.choices:
            print("\nUsage:")
            print(chunk.usage)
            continue
    
        delta = chunk.choices[0].delta
    
        # Collect only the thinking content.
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
    
        # After receiving the content, start generating the response.
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content

    Response

    ====================Thinking Process====================
    
    Okay, the user asked "Who are you". I need to give a clear and friendly answer. First, I should clarify my identity as Qwen, developed by Tongyi Lab of Alibaba Group. Then, I need to explain my main functions, such as answering
    ====================Complete Response====================
    
    I am Qwen, a large-scale language model developed by Tongyi Lab of Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code to help and assist users. Is there anything I can help you with?

    Node.js

    Sample code

    import OpenAI from "openai";
    import process from 'process';
    
    // Initialize the OpenAI client.
    const openai = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
        // The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/compatible-mode/v1. 
        // If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/compatible-mode/v1.
        baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
    });
    
    let reasoningContent = '';
    let answerContent = '';
    let isAnswering = false;
    
    
    async function main() {
        try {
            const messages = [{ role: 'user', content: 'Who are you' }];
            const stream = await openai.chat.completions.create({
                model: 'qwen-plus',
                messages,
                stream: true,
                // The enable_thinking parameter enables the thinking process. The thinking_budget parameter sets the maximum number of tokens for the inference process.
                enable_thinking: true,
                thinking_budget: 50
            });
            console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
    
            for await (const chunk of stream) {
                if (!chunk.choices?.length) {
                    console.log('\nUsage:');
                    console.log(chunk.usage);
                    continue;
                }
    
                const delta = chunk.choices[0].delta;
                
                // Collect only the thinking content.
                if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                    if (!isAnswering) {
                        process.stdout.write(delta.reasoning_content);
                    }
                    reasoningContent += delta.reasoning_content;
                }
    
                // After receiving the content, start generating the response.
                if (delta.content !== undefined && delta.content) {
                    if (!isAnswering) {
                        console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                        isAnswering = true;
                    }
                    process.stdout.write(delta.content);
                    answerContent += delta.content;
                }
            }
        } catch (error) {
            console.error('Error:', error);
        }
    }
    
    main();

    Response

    ====================Thinking Process====================
    
    Okay, the user asked "Who are you". I need to provide a clear and accurate answer. First, I should introduce myself as Qwen, developed by Tongyi Lab of Alibaba Group. Next, I should explain my main functions, such as answering questions
    ====================Complete Response====================
    
    I am Qwen, a large-scale language model independently developed by Tongyi Lab of Alibaba Group. I can perform various tasks such as answering questions, creating text, performing logical reasoning, and writing code. If you have any questions or need help, feel free to ask me at any time!

    HTTP

    Sample code

    curl

    # ======= Important =======
    # The following is the base_url for Singapore. If you use a model in the Beijing region, replace the base_url with https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
    # If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
    # === Delete this comment before execution ===
    curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-plus",
        "messages": [
            {
                "role": "user", 
                "content": "Who are you"
            }
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        },
        "enable_thinking": true,
        "thinking_budget": 50
    }'

    Response

    data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    .....
    
    data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
    
    data: [DONE]

    DashScope

    The DashScope API for Qwen3.5 uses a multimodal interface, the following example will return a url error. For the correct invocation method, see Enable or disable thinking mode.

    Python

    Sample code

    import os
    from dashscope import Generation
    import dashscope
    # The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
    # If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
    dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
    
    messages = [{"role": "user", "content": "Who are you?"}]
    
    completion = Generation.call(
        # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        model="qwen-plus",
        messages=messages,
        result_format="message",
        enable_thinking=True,
        # Set the maximum number of tokens for the inference process.
        thinking_budget=50,
        stream=True,
        incremental_output=True,
    )
    
    # Define the complete thinking process.
    reasoning_content = ""
    # Define the complete response.
    answer_content = ""
    # Determine whether to end the thinking process and start the response.
    is_answering = False
    
    print("=" * 20 + "Thinking Process" + "=" * 20)
    
    for chunk in completion:
        # If both the thinking process and the response are empty, ignore the chunk.
        if (
            chunk.output.choices[0].message.content == ""
            and chunk.output.choices[0].message.reasoning_content == ""
        ):
            pass
        else:
            # If it is currently in the thinking process.
            if (
                chunk.output.choices[0].message.reasoning_content != ""
                and chunk.output.choices[0].message.content == ""
            ):
                print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
                reasoning_content += chunk.output.choices[0].message.reasoning_content
            # If it is currently in the response phase.
            elif chunk.output.choices[0].message.content != "":
                if not is_answering:
                    print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
                    is_answering = True
                print(chunk.output.choices[0].message.content, end="", flush=True)
                answer_content += chunk.output.choices[0].message.content
    
    # To print the complete thinking process and the complete response, uncomment and run the following code.
    # print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
    # print(f"{reasoning_content}")
    # print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
    # print(f"{answer_content}")
    

    Response

    ====================Thinking Process====================
    Okay, the user asked "Who are you?". I need to give a clear and friendly answer. First, I should introduce myself as Qwen, developed by Tongyi Lab of Alibaba Group. Next, I should explain my main functions, such as
    ====================Complete Response====================
    I am Qwen, a large-scale language model independently developed by Tongyi Lab of Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code to provide users with comprehensive, accurate, and useful information and help. Is there anything I can help you with?

    Java

    Sample code

    // DashScope SDK version >= 2.19.4
    import java.util.Arrays;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import com.alibaba.dashscope.aigc.generation.Generation;
    import com.alibaba.dashscope.aigc.generation.GenerationParam;
    import com.alibaba.dashscope.aigc.generation.GenerationResult;
    import com.alibaba.dashscope.common.Message;
    import com.alibaba.dashscope.common.Role;
    import com.alibaba.dashscope.exception.ApiException;
    import com.alibaba.dashscope.exception.InputRequiredException;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import io.reactivex.Flowable;
    import java.lang.System;
    import com.alibaba.dashscope.utils.Constants;
    
    public class Main {
        static {
            // The following is the base_url for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
            // If you use a model in the Beijing region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
            Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
        }
        private static final Logger logger = LoggerFactory.getLogger(Main.class);
        private static StringBuilder reasoningContent = new StringBuilder();
        private static StringBuilder finalContent = new StringBuilder();
        private static boolean isFirstPrint = true;
    
        private static void handleGenerationResult(GenerationResult message) {
            String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
            String content = message.getOutput().getChoices().get(0).getMessage().getContent();
    
            if (!reasoning.isEmpty()) {
                reasoningContent.append(reasoning);
                if (isFirstPrint) {
                    System.out.println("====================Thinking Process====================");
                    isFirstPrint = false;
                }
                System.out.print(reasoning);
            }
    
            if (!content.isEmpty()) {
                finalContent.append(content);
                if (!isFirstPrint) {
                    System.out.println("\n====================Complete Response====================");
                    isFirstPrint = true;
                }
                System.out.print(content);
            }
        }
        private static GenerationParam buildGenerationParam(Message userMsg) {
            return GenerationParam.builder()
                    // If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("qwen-plus")
                    .enableThinking(true)
                    .thinkingBudget(50)
                    .incrementalOutput(true)
                    .resultFormat("message")
                    .messages(Arrays.asList(userMsg))
                    .build();
        }
        public static void streamCallWithMessage(Generation gen, Message userMsg)
                throws NoApiKeyException, ApiException, InputRequiredException {
            GenerationParam param = buildGenerationParam(userMsg);
            Flowable<GenerationResult> result = gen.streamCall(param);
            result.blockingForEach(message -> handleGenerationResult(message));
        }
    
        public static void main(String[] args) {
            try {
                Generation gen = new Generation();
                Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
                streamCallWithMessage(gen, userMsg);
    //             Print the final result.
    //            if (reasoningContent.length() > 0) {
    //                System.out.println("\n====================Complete Response====================");
    //                System.out.println(finalContent.toString());
    //            }
            } catch (ApiException | NoApiKeyException | InputRequiredException e) {
                logger.error("An exception occurred: {}", e.getMessage());
            }
            System.exit(0);
        }
    }

    Response

    ====================Thinking Process====================
    Okay, the user asked "Who are you?". I need to give a clear and friendly answer. First, I should introduce myself as Qwen, developed by Tongyi Lab of Alibaba Group. Next, I should explain my main functions, such as
    ====================Complete Response====================
    I am Qwen, a large-scale language model independently developed by Tongyi Lab of Alibaba Group. I can answer questions, create text, perform logical reasoning, and write code to provide users with comprehensive, accurate, and useful information and help. Is there anything I can help you with?

    HTTP

    Sample code

    curl

    # ======= Important =======
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
    # If you use a model in the US (Virginia) region, replace the URL with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation
    # === Delete this comment before execution ===
    curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-DashScope-SSE: enable" \
    -d '{
        "model": "qwen-plus",
        "input":{
            "messages":[      
                {
                    "role": "user",
                    "content": "Who are you?"
                }
            ]
        },
        "parameters":{
            "enable_thinking": true,
            "thinking_budget": 50,
            "incremental_output": true,
            "result_format": "message"
        }
    }'

    Response

    id:1
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"OK","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"output_tokens":3,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":1}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
    
    id:2
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"output_tokens":4,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":2}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
    
    ......
    
    id:133
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}
    
    id:134
    event:result
    :HTTP_STATUS/200
    data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":149,"output_tokens":138,"input_tokens":11,"output_tokens_details":{"reasoning_tokens":50}},"request_id":"2ce91085-3602-9c32-9c8b-fe3d583a2c38"}

    Other features

    • Multi-turn conversations

    • Tool calling

    • Web search

    Billing

    • Charges for thinking content are based on output tokens.

    • Some hybrid thinking models have different prices for thinking mode and non-thinking mode.

      If a model does not output the thinking process in thinking mode, it is billed at the non-thinking mode price.

    FAQ

    Q: How do I disable thinking mode?

    Whether thinking mode can be disabled depends on the model type:

    • If you use a hybrid-thinking model (such as qwen-plus or deepseek-v3.2-exp), set enable_thinking to false.

    • If you use a thinking-only model (such as qwen3-235b-a22b-thinking-2507 or deepseek-r1), thinking mode cannot be disabled.

    Q: Which models support non-streaming output?

    Deep thinking models require time to reason before responding. This increases response latency and raises timeout risks for non-streaming calls. We recommend using streaming calls. If you need non-streaming output, use one of the following supported models.

    Qwen

    • Commercial edition

      • Qwen Max series: qwen3-max-preview

      • Qwen Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen-plus

      • Qwen Flash series: qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen-flash, qwen-flash-2025-07-28

      • Qwen Turbo series: qwen-turbo

    • Open-source edition

      • qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b, qwen3-next-80b-a3b-thinking, qwen3-235b-a22b-thinking-2507, qwen3-30b-a3b-thinking-2507

    DeepSeek (Beijing)

    deepseek-v3.2, deepseek-v3.2-exp, deepseek-r1, deepseek-r1-0528, distilled DeepSeek-R1 models

    Kimi (Beijing)

    kimi-k2-thinking

    Q: How do I purchase tokens after my free quota runs out?

    Go to the or Expenses and Costs center to top up your account. Ensure your account has no overdue payment before calling models.

    After you exceed your free quota, model calls are billed automatically. Billing occurs hourly. To view your spending details, go to Bill details.

    Q: Can I upload images or documents to ask questions?

    The model in this topic supports only text input. Qwen3-VL and QVQ support deep thinking on images.

    Q: How do I view token usage and call counts?

    One hour after you call a model, go to the Monitoring (Singapore or Beijing) page. Set the query conditions, such as the time range and workspace. Then, in the Models area, find the target model and click Monitor in the Actions column to view the model's call statistics. For more information, see the Monitoring document.

    Data is updated hourly. During peak periods, there may be an hour-level latency.

    image

    API reference

    See the input and output parameters for deep thinking models in Qwen.

    Error codes

    If execution fails, see Error messages for troubleshooting.

    Thank you! We've received your feedback.