All Products
Search
Document Center

Alibaba Cloud Model Studio:OpenAI compatible - Responses

Last Updated:Mar 30, 2026

Model Studio’s Qwen models support the OpenAI-compatible Responses API. As an evolution of the Chat Completions API, the Responses API delivers native agent capabilities in a concise and intuitive way.

Advantages over the OpenAI Chat Completions API:

  • Built-in tools: Includes built-in tools such as web search, web extractor, code interpreter, text-to-image search, and image-to-image search. These tools improve results for complex tasks. For more information, see Call built-in tools.

  • More flexible input: Accepts either a plain string or an array of messages in Chat format as model input.

  • Simplified context management: Pass the previous_response_id from the previous response instead of manually constructing a full message history array.

For more information about the input and output parameters, see the OpenAI Responses API reference.

Prerequisites

First, get an API key and set the API key as an environment variable. If you use the OpenAI SDK, install it first.

Supported models

qwen3-max, qwen3-max-2026-01-23, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash.

Service endpoints

Singapore

base_url for SDK: https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1

HTTP endpoint: POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses

China (Beijing)

base_url for SDK: https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1

HTTP endpoint: POST https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses

Code examples

Basic call

Send a message and retrieve the model’s reply.

Python

import os
from openai import OpenAI

client = OpenAI(
    # If environment variable is not set, replace with: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)

response = client.responses.create(
    model="qwen3.5-plus",
    input="What can you do?"
)

# Get model response
# print(response.model_dump_json())
print(response.output_text)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    // If environment variable is not set, replace with: apiKey: "sk-xxx"
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});

async function main() {
    const response = await openai.responses.create({
        model: "qwen3.5-plus",
        input: "What can you do?"
    });

    // Get model response
    console.log(response.output_text);
}

main();

curl

curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "What can you do?"
}'

Response example

The API returns the following complete response.
{
    "created_at": 1771226624,
    "id": "bf0d5c2e-f14b-9ad7-bc0d-ee0c8c9ee2d8",
    "model": "qwen3-max-2026-01-23",
    "object": "response",
    "output": [
        {
            "content": [
                {
                    "annotations": [],
                    "text": "Hi there!  I'm actually quite ......",
                    "type": "output_text"
                }
            ],
            "id": "msg_1e17fdb2-5fc3-4c78-a9e9-cbd78eb043f0",
            "role": "assistant",
            "status": "completed",
            "type": "message"
        }
    ],
    "parallel_tool_calls": false,
    "status": "completed",
    "tool_choice": "auto",
    "tools": [],
    "usage": {
        "input_tokens": 37,
        "input_tokens_details": {
            "cached_tokens": 0
        },
        "output_tokens": 220,
        "output_tokens_details": {
            "reasoning_tokens": 0
        },
        "total_tokens": 257,
        "x_details": [
            {
                "input_tokens": 37,
                "output_tokens": 220,
                "total_tokens": 257,
                "x_billing_type": "response_api"
            }
        ]
    }
}

Multi-turn conversation

Use previous_response_id to link context instead of manually building the message history. The response id is valid for 7 days.

id from the previous response (for example, f0dbb153-117f-9bbf-8176-5284b47f3xxx in UUID format) as the value of previous_response_id. Do not use the id of a message within the output array (for example, msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx).

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)

# First round
response1 = client.responses.create(
    model="qwen3.5-plus",
    input="My name is John, please remember it."
)
print(f"First response: {response1.output_text}")

# Second round - use previous_response_id to link context
# The response id expires in 7 days
response2 = client.responses.create(
    model="qwen3.5-plus",
    input="Do you remember my name?",
    previous_response_id=response1.id
)
print(f"Second response: {response2.output_text}")

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});

async function main() {
    // First round
    const response1 = await openai.responses.create({
        model: "qwen3.5-plus",
        input: "My name is John, please remember it."
    });
    console.log(`First response: ${response1.output_text}`);

    // Second round - use previous_response_id to link context
    // The response id expires in 7 days
    const response2 = await openai.responses.create({
        model: "qwen3.5-plus",
        input: "Do you remember my name?",
        previous_response_id: response1.id
    });
    console.log(`Second response: ${response2.output_text}`);
}

main();

curl

# First round
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "My name is John, please remember it."
}'

# Second round - use the id from first response as previous_response_id
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "Do you remember my name?",
    "previous_response_id": "response_id_from_first_round"
}'

Second-turn response example

{
  "id": "f0dbb153-117f-9bbf-8176-5284b47f3xxx",
  "created_at": 1769173209.0,
  "model": "qwen3.5-plus",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "id": "msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx",
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Yes, John! I remember your name. How can I assist you today?",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 78,
    "output_tokens": 16,
    "total_tokens": 94,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

Note: The input_tokens for the second turn is 78, which includes the context from the first turn. The model successfully remembered the name “John”.

Streaming output

Streaming output returns model-generated content in real time, which suits long text generation.

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)

stream = client.responses.create(
    model="qwen3.5-plus",
    input="Please briefly introduce artificial intelligence.",
    stream=True
)

print("Receiving stream output:")
for event in stream:
    # print(event.model_dump_json())  # Uncomment to see raw event response
    if event.type == 'response.output_text.delta':
        print(event.delta, end='', flush=True)
    elif event.type == 'response.completed':
        print("\nStream completed")
        print(f"Total tokens: {event.response.usage.total_tokens}")

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});

async function main() {
    const stream = await openai.responses.create({
        model: "qwen3.5-plus",
        input: "Please briefly introduce artificial intelligence.",
        stream: true
    });

    console.log("Receiving stream output:");
    for await (const event of stream) {
        // console.log(JSON.stringify(event));  // Uncomment to see raw event response
        if (event.type === 'response.output_text.delta') {
            process.stdout.write(event.delta);
        } else if (event.type === 'response.completed') {
            console.log("\nStream completed");
            console.log(`Total tokens: ${event.response.usage.total_tokens}`);
        }
    }
}

main();

curl

curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "Please briefly introduce artificial intelligence.",
    "stream": true
}'

Response example

{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"queued","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":0,"type":"response.created"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":1,"type":"response.in_progress"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[],"role":"assistant","status":"in_progress","type":"message"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"","type":"output_text","logprobs":null},"sequence_number":3,"type":"response.content_part.added"}
{"content_index":0,"delta":"Artificial intelligence","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":4,"type":"response.output_text.delta"}
{"content_index":0,"delta":" (Art","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":5,"type":"response.output_text.delta"}
{"content_index":0,"delta":"ificial Intelligence, ","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":6,"type":"response.output_text.delta"}
{"content_index":0,"delta":"or AI)","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":7,"type":"response.output_text.delta"}
... (intermediate events omitted) ...
{"content_index":0,"delta":" fields, and is profoundly changing our","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":38,"type":"response.output_text.delta"}
{"content_index":0,"delta":" lives and ways of work","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":39,"type":"response.output_text.delta"}
{"content_index":0,"delta":".","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":40,"type":"response.output_text.delta"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":41,"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior with computer systems. xxxx","type":"response.output_text.done"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior with computer systems. xxx","type":"output_text","logprobs":null},"sequence_number":42,"type":"response.content_part.done"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior with computer systems. It aims to enable machines to perform tasks that typically require human intelligence, such as:\n\n- **Learning** (for example, training models with data)  \n- **Reasoning** (for example, logical judgment and problem-solving)  \n- **Perception** (for example, recognizing images, speech, or text)  \n- **Understanding language** (for example, natural language processing)  \n- **Decision-making** (for example, making optimal choices in complex environments)\n\nArtificial intelligence can be divided into **weak AI** (focusing on specific tasks, such as voice assistants and recommendation systems) and **strong AI** (possessing general intelligence similar to humans, which has not yet been achieved).\n\nCurrently, AI has been widely applied in many fields, such as healthcare, finance, transportation, education, and entertainment, and is profoundly changing our lives and ways of work.","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"},"output_index":0,"sequence_number":43,"type":"response.output_item.done"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"qwen3.5-plus","object":"response","output":[{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"Artificial intelligence (AI) is xxxxxx","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"completed","text":null,"top_logprobs":null,"truncation":null,"usage":{"input_tokens":37,"input_tokens_details":{"cached_tokens":0},"output_tokens":166,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":203},"user":null},"sequence_number":44,"type":"response.completed"}

Deep thinking

With deep thinking mode enabled, the model reasons before replying. The thinking process appears in an output item of type reasoning, suitable for complex reasoning tasks.

The thinking_budget parameter for controlling the maximum thinking length is not supported.

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)

response = client.responses.create(
    model="qwen3.5-plus",
    input="Which is larger, 9.9 or 9.11?",
    extra_body={"enable_thinking": True}
)

# Process the output
for item in response.output:
    if item.type == "reasoning":
        print("=== Thinking Process ===")
        for summary in item.summary:
            print(summary.text)
    elif item.type == "message":
        print("\n=== Final Answer ===")
        print(item.content[0].text)

# View the number of thinking tokens
print(f"\nThinking Tokens: {response.usage.output_tokens_details.reasoning_tokens}")

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});

async function main() {
    const response = await openai.responses.create({
        model: "qwen3.5-plus",
        input: "Which is larger, 9.9 or 9.11?",
        enable_thinking: true
    });

    for (const item of response.output) {
        if (item.type === "reasoning") {
            console.log("=== Thinking Process ===");
            for (const summary of item.summary) {
                console.log(summary.text);
            }
        } else if (item.type === "message") {
            console.log("\n=== Final Answer ===");
            console.log(item.content[0].text);
        }
    }

    // View the number of thinking tokens
    console.log(`\nThinking Tokens: ${response.usage.output_tokens_details.reasoning_tokens}`);
}

main();

curl

curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "Which is larger, 9.9 or 9.11?",
    "enable_thinking": true
}'

Response example

{
    "created_at": 1774498317,
    "id": "resp_xxx",
    "model": "qwen3.5-plus",
    "object": "response",
    "output": [
        {
            "id": "msg_xxx",
            "summary": [
                {
                    "text": "Thinking Process:\n\n1.  **Analyze the Request:**\n    *   Question: \"Which is larger, 9.9 or 9.11?\"\n    *   Context: The user is asking a simple mathematical comparison question.\n    *   Current Date: Thursday, March 26, 2026 (provided in the system prompt).\n    *   Knowledge Cutoff: 2026 (provided in the system prompt).\n\n2.  **Evaluate the Numbers:**\n    *   Number A: 9.9\n    *   Number B: 9.11\n    *   These are decimal numbers.\n    *   Compare the integer part: Both are 9.\n    *   Compare the tenths place (first decimal digit):\n        *   9.9 has 9 in the tenths place.\n        *   9.11 has 1 in the tenths place.\n    *   Since 9 > 1, 9.9 is greater than 9.11.\n\n3.  **Consider Potential Ambiguities:**\n    *   Could this be version numbering? (e.g., software versions). In versioning, 9.11 is often \"newer\" or \"higher\" than 9.9. However, mathematically, 9.9 > 9.11.\n    *   Could this be dates? (September 9th vs September 11th). 11th is later.\n    *   Standard interpretation: Without context, decimal numbers are assumed to be mathematical values.\n    *   Common pitfall: Some people mistakenly treat decimals like whole numbers (where 11 > 9), leading them to think 9.11 > 9.9. This is a known cognitive bias or misconception in elementary math.\n    *   Decision: Provide the mathematical answer clearly, but perhaps acknowledge the versioning context if relevant (though usually, for this specific question, it's a math test). Given the simplicity, stick to the mathematical truth first.\n\n4.  **Formulate the Answer:**\n    *   Direct answer: 9.9 is larger.\n    *   Explanation: Compare place values. 9.9 = 9.90, 9.11 = 9.11. 90 hundredths > 11 hundredths.\n    *   Tone: Helpful, clear.\n\n5.  **Drafting the Response (Internal Monologue/Draft):**\n    *   Okay, mathematically 9.9 is bigger. 9.9 is 9 and 9 tenths. 9.11 is 9 and 11 hundredths (or 1 tenth and 1 hundredth). 9 tenths is more than 1 tenth.\n    *   So, 9.9 > 9.11.\n    *   I should answer in Chinese since the question is in Chinese.\n    *   \"9.9 is larger.\"\n    *   Add a brief explanation to avoid confusion. \"Because 9.9 equals 9.90, and 9.90 is greater than 9.11.\"\n\n6.  **Refining based on \"Version Number\" possibility:**\n    *   Sometimes this is a trick question about software versions. In semver, 9.11 > 9.9.\n    *   However, usually, when asked plainly \"which is larger\", it refers to numerical value.\n    *   I will provide the numerical answer as primary, maybe add a note about versions if it feels necessary. But keeping it simple is better. Most likely, this is testing the decimal comparison logic.\n    *   Let's stick to the mathematical value as the default.\n\n7.  **Final Polish:**\n    *   Answer: 9.9 is larger.\n    *   Reasoning: When comparing decimals, first look at the integer part, then the tenths place. The tenths digit of 9.9 is 9, and the tenths digit of 9.11 is 1. Since 9 > 1, 9.9 > 9.11.\n\n8.  **Output Generation:** (Matching the thought process)\n    *   State clearly which is larger.\n    *   Explain why.\n\n    *Self-Correction on Date/Time:* The system prompt mentions the current date is 2026. This doesn't affect the math question, but I should ensure I don't reference anything post-2026 if I were adding context (not needed here).\n\n    *Final Decision:* Just answer the math question directly.\n\n    \"Numerically, 9.9 is larger.\"\n    Explain: 9.9 = 9.90, 9.11 = 9.11. 90 > 11.\n\n    Wait, is there any other interpretation?\n    - Dates? 9/9 vs 9/11. 11th is later.\n    - Versions? 9.11 is newer.\n    - But \"谁大\" (which means \"who is big/larger\") usually implies magnitude.\n    - I will answer based on numerical magnitude but briefly mention the version context if it adds value without confusing. Actually, better to be decisive. Numerical value is the standard interpretation for \"who is larger\" with numbers.\n\n    Let's go with the numerical answer.\n\n    **Plan:**\n    1. State 9.9 is larger numerically.\n    2. Explain the decimal comparison.\n    3. (Optional but helpful) Mention that if it's version numbers, 9.11 might be considered \"higher\". But prioritize the math.\n    Actually, for a simple query like this, over-explaining versions might be confusing. I'll stick to the math but acknowledge the common confusion.\n\n    *Revised Plan:*\n    1. Direct answer: 9.9 is larger (mathematically).\n    2. Explanation: Place value.\n    3. Note: Unless it's a version number.\n\n    Let's keep it concise.\n\n    \"In terms of mathematical value, 9.9 is larger.\"\n    \"Because 9.9 is equal to 9.90, which is larger than 9.11.\"\n    \"If it's a version number, then version 9.11 is usually higher than version 9.9.\"\n    This covers both bases.\n\n    Wait, looking at common AI training, this is often a \"trap\" question to see if the AI treats decimals like integers. The correct mathematical answer is crucial.\n\n    Okay, final output structure:\n    1. Mathematical answer.\n    2. Brief explanation.\n    3. Contextual note (versions).\n\n    Let's write it.\n    \"In terms of mathematical value, 9.9 is larger.\"\n    \"When comparing decimals, first look at the integer part (both are 9), then the tenths place: the tenths digit of 9.9 is 9, while the tenths digit of 9.11 is 1. Because 9 > 1, 9.9 > 9.11.\"\n    \"However, if these are software version numbers, version 9.11 would typically be newer than version 9.9.\"\n\n    This is balanced.\n\n    Wait, I need to check the instruction again. \"Please remember the current actual time...\". This is just system context. I don't need to mention the date in the answer unless relevant. It's not relevant here.\n\n    Okay, ready to generate.\n\n    Wait, one more check. Is there any cultural context? In some contexts, people might say 9.11 is bigger because 11 > 9. This is the misconception I need to correct.\n    So, emphasizing the mathematical rule is key.\n\n    Final check:\n    9.9 = 9 + 9/10 = 9.90\n    9.11 = 9 + 1/10 + 1/100 = 9.11\n    9.90 > 9.11. Correct.\n\n    The response will be in Chinese.\n    \"In terms of numerical value, 9.9 is larger.\"\n    Explain why.\n    Mention version exception.\n    Done.\n",
                    "type": "summary_text"
                }
            ],
            "type": "reasoning"
        },
        {
            "content": [
                {
                    "annotations": [],
                    "text": "In terms of **mathematical value**, **9.9 is larger**.\n\nHere's why:\nWhen comparing decimals, you first look at the integer part, which is 9 for both. Then you look at the tenths place (the first digit after the decimal point):\n*   The tenths digit of 9.9 is **9**\n*   The tenths digit of 9.11 is **1**\n\nBecause 9 is greater than 1, **9.9 > 9.11** (you can think of 9.9 as 9.90 for comparison).\n\n**Note**: If these were **software version numbers**, version 9.11 would typically be considered newer (higher) than version 9.9. But in a purely numerical comparison, 9.9 is larger.",
                    "type": "output_text"
                }
            ],
            "id": "msg_xxx",
            "role": "assistant",
            "status": "completed",
            "type": "message"
        }
    ],
    "parallel_tool_calls": false,
    "status": "completed",
    "tool_choice": "auto",
    "tools": [],
    "usage": {
        "input_tokens": 57,
        "input_tokens_details": {
            "cached_tokens": 0
        },
        "output_tokens": 2018,
        "output_tokens_details": {
            "reasoning_tokens": 1861
        },
        "total_tokens": 2075,
        "x_details": [
            {
                "input_tokens": 57,
                "output_tokens": 2018,
                "output_tokens_details": {
                    "reasoning_tokens": 1861
                },
                "total_tokens": 2075,
                "x_billing_type": "response_api"
            }
        ]
    }
}

Call built-in tools

Enable built-in tools to improve results for complex tasks. Web extractor and code interpreter are free for a limited time. For supported tools, see Tool calling.

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)

response = client.responses.create(
    model="qwen3.5-plus",
    input="Find the Alibaba Cloud website and extract key information",
    # For best results, enable all the built-in tools
    tools=[
        {"type": "web_search"},
        {"type": "code_interpreter"},
        {"type": "web_extractor"}
    ],
    extra_body={"enable_thinking": True}
)

# Uncomment the line below to see the intermediate output
# print(response.output)
print(response.output_text)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});

async function main() {
    const response = await openai.responses.create({
        model: "qwen3.5-plus",
        input: "Find the Alibaba Cloud website and extract key information",
        tools: [
            { type: "web_search" },
            { type: "code_interpreter" },
            { type: "web_extractor" }
        ],
        enable_thinking: true
    });

    for (const item of response.output) {
        if (item.type === "reasoning") {
            console.log("Model is thinking...");
        } else if (item.type === "web_search_call") {
            console.log(`Search query: ${item.action.query}`);
        } else if (item.type === "web_extractor_call") {
            console.log("Extracting web content...");
        } else if (item.type === "message") {
            console.log(`Response: ${item.content[0].text}`);
        }
    }
}

main();

curl

curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "Find the Alibaba Cloud website and extract key information",
    "tools": [
        {
            "type": "web_search"
        },
        {
            "type": "code_interpreter"
        },
        {
            "type": "web_extractor"
        }
    ],
    "enable_thinking": true
}'

Response example

{
    "id": "69258b21-5099-9d09-92e8-8492b1955xxx",
    "object": "response",
    "status": "completed",
    "output": [
        {
            "type": "reasoning",
            "summary": [
                {
                    "type": "summary_text",
                    "text": "The user wants to find the Alibaba Cloud official website and extract information..."
                }
            ]
        },
        {
            "type": "web_search_call",
            "status": "completed",
            "action": {
                "query": "Alibaba Cloud official website",
                "type": "search",
                "sources": [
                    {
                        "type": "url",
                        "url": "https://cn.aliyun.com/"
                    },
                    {
                        "type": "url",
                        "url": "https://www.alibabacloud.com/zh"
                    }
                ]
            }
        },
        {
            "type": "reasoning",
            "summary": [
                {
                    "type": "summary_text",
                    "text": "The search results show the URL of the Alibaba Cloud official website..."
                }
            ]
        },
        {
            "type": "web_extractor_call",
            "status": "completed",
            "goal": "Extract key information from the homepage of the Alibaba Cloud official website",
            "output": "Qwen large language model, complete product system, AI solutions...",
            "urls": [
                "https://cn.aliyun.com/"
            ]
        },
        {
            "type": "message",
            "role": "assistant",
            "status": "completed",
            "content": [
                {
                    "type": "output_text",
                    "text": "Key information from the Alibaba Cloud official website: Qwen large language model, cloud computing services..."
                }
            ]
        }
    ],
    "usage": {
        "input_tokens": 40836,
        "output_tokens": 2106,
        "total_tokens": 42942,
        "output_tokens_details": {
            "reasoning_tokens": 677
        },
        "x_tools": {
            "web_extractor": {
                "count": 1
            },
            "web_search": {
                "count": 1
            }
        }
    }
}

Session cache

Overview

Session cache is a server-side cache mode for multi-turn conversations in the Responses API. Unlike explicit caching, which requires you to add the cache_control flag manually, session cache handles caching logic automatically. Enable or disable it with an HTTP header, then make calls as in a normal multi-turn conversation.

When using previous_response_id for multi-turn conversations, session cache lets the server cache the conversation context automatically, reducing inference latency and costs.

Usage

Add one of the following fields to the request header to enable or disable session cache:

  • x-dashscope-session-cache: enable: Enables session cache.

  • x-dashscope-session-cache: disable: Disables session cache. If the model supports it, implicit caching will be enabled instead.

Pass this header through default_headers (Python) or defaultHeaders (Node.js). With curl, use -H.

Supported models

qwen3-max, qwen3.5-plus, qwen3.5-flash, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash

Session cache applies only to the Responses API (OpenAI compatible - Responses) and not to the Chat Completions API.

Code examples

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
    # Enable session cache through default_headers
    default_headers={"x-dashscope-session-cache": "enable"}
)

# Construct a long text of over 1,024 tokens to ensure cache creation. If the text is less than 1,024 tokens, the cache is created when the accumulated conversation context exceeds 1,024 tokens.
long_context = "Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence." * 50

# First turn
response1 = client.responses.create(
    model="qwen3.5-plus",
    input=long_context + "\n\nBased on the background knowledge above, please briefly introduce the random forest algorithm in machine learning.",
)
print(f"First reply: {response1.output_text}")

# Second turn: Link context using previous_response_id. The cache is handled automatically by the server.
response2 = client.responses.create(
    model="qwen3.5-plus",
    input="What are the main differences between it and GBDT?",
    previous_response_id=response1.id,
)
print(f"Second reply: {response2.output_text}")

# Check the cache hit status
usage = response2.usage
print(f"Input Tokens: {usage.input_tokens}")
print(f"Cached Tokens: {usage.input_tokens_details.cached_tokens}")

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
    // Enable session cache through defaultHeaders
    defaultHeaders: {"x-dashscope-session-cache": "enable"}
});

// Construct a long text of over 1,024 tokens to ensure cache creation. If the text is less than 1,024 tokens, the cache is created when the accumulated conversation context exceeds 1,024 tokens.
const longContext = "Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence.".repeat(50);

async function main() {
    // First turn
    const response1 = await openai.responses.create({
        model: "qwen3.5-plus",
        input: longContext + "\n\nBased on the background knowledge above, please briefly introduce the random forest algorithm in machine learning, including its basic principles and application scenarios."
    });
    console.log(`First reply: ${response1.output_text}`);

    // Second turn: Link context using previous_response_id. The cache is handled automatically by the server.
    const response2 = await openai.responses.create({
        model: "qwen3.5-plus",
        input: "What are the main differences between it and GBDT?",
        previous_response_id: response1.id
    });
    console.log(`Second reply: ${response2.output_text}`);

    // Check the cache hit status
    console.log(`Input Tokens: ${response2.usage.input_tokens}`);
    console.log(`Cached Tokens: ${response2.usage.input_tokens_details.cached_tokens}`);
}

main();

curl

# First turn
# Replace the input with a long text of over 1,024 tokens to ensure cache creation.
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
    "model": "qwen3.5-plus",
    "input": "Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence.\n\nBased on the background knowledge above, please briefly introduce the random forest algorithm in machine learning, including its basic principles and application scenarios."
}'

# Second turn - Use the ID returned in the first turn as the previous_response_id
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
    "model": "qwen3.5-plus",
    "input": "What are the main differences between it and GBDT?",
    "previous_response_id": "response_id_from_first_turn"
}'

Second-turn response example (cache hit)

In the second-turn response, usage.input_tokens_details.cached_tokens indicates the number of tokens that hit the cache.

{
    "id": "145584fd-3dce-4890-99dc-e3896d7f5a42",
    "created_at": 1772440976.0,
    "error": null,
    "incomplete_details": null,
    "instructions": null,
    "metadata": null,
    "model": "qwen3.5-plus",
    "object": "response",
    "output": [
        {
            "id": "msg_62a4e323-d78c-46c7-8469-2ad50f8af4b1",
            "summary": [
                {
                    "text": "Thinking Process:\n\n1.  **Analyze the Request:**\n    *   User asks for the main differences between Random Forest (RF) and GBDT (Gradient Boosting Decision Tree).\n    *   Context: The previous turn provided a definition of AI (repeated many times) and then a brief introduction to Random Forest.\n    *   Constraint: Keep it concise (implied by \"briefly introduce\" in the first turn, though this turn asks for \"main differences\"). The tone should be informative and professional.\n    *   Current Date: Monday, March 02, 2026.\n    *   Knowledge Cutoff: 2026.\n\n2.  **Recall Knowledge about RF vs. GBDT:**\n    *   **Bagging vs. Boosting:** RF is Bagging (Parallel), GBDT is Boosting (Sequential).\n    *   **Bias/Variance:** RF reduces variance, GBDT reduces bias.\n    *   **Sample/Feature Selection:** RF uses bootstrap sampling (row sampling) and feature subsampling. GBDT usually uses all samples (weighted) and often all features (though variants like XGBoost/LightGBM introduce feature sampling).\n    *   **Tree Structure:** RF trees are usually deep and independent. GBDT trees are usually shallow (weak learners) and dependent (each corrects the previous).\n    *   **Sensitivity to Outliers:** GBDT is more sensitive.\n    *   **Parallelization:** RF is easily parallelized. GBDT is inherently sequential (though tree construction within a step can be parallelized).\n\n3.  **Draft the Response:**\n    *   Structure: Core difference (Bagging vs. Boosting), Training method, Performance characteristics, Sensitivity.\n    *   Language: Chinese.\n    *   Tone: Professional, clear.\n\n4.  **Refine the Content (Self-Correction/Optimization):**\n    *   Make sure to highlight the *ensemble strategy* (Bagging vs. Boosting) as the fundamental difference.\n    *   Mention parallel vs. sequential training.\n    *   Mention bias/variance trade-off.\n    *   Keep it brief but comprehensive enough for a \"main differences\" question.\n\n5.  **Final Polish (incorporating constraints):**\n    *   Ensure no hallucination.\n    *   Ensure alignment with the 2026 knowledge cutoff (nothing has changed fundamentally in RF vs. GBDT theory).\n    *   Check against the repeated text in the first prompt (ignore the repetition, focus on the actual question).\n\n6.  **Drafting the response (internal monologue):**\n    *   Start with the ensemble type: RF = Bagging, GBDT = Boosting.\n    *   Training: RF = Parallel, independent trees. GBDT = Sequential, dependent trees (fitting residuals).\n    *   Goal: RF = Reduce variance. GBDT = Reduce bias.\n    *   Sensitivity: GBDT more sensitive to outliers.\n    *   Summary table or bullet points for clarity.\n\n7.  **Final Output Generation:** (Matches the drafted points).\n\n    *   **Ensemble method**: RF is Bagging, GBDT is Boosting.\n    *   **Training process**: RF is parallel, GBDT is serial.\n    *   **Focus**: RF reduces variance, GBDT reduces bias.\n    *   **Outliers**: GBDT is more sensitive.\n\n    Let's write it out clearly.\n\n    *   **Core ensemble strategy**: Random Forest is based on the **Bagging** idea, where trees are independent of each other. GBDT is based on the **Boosting** idea, where trees are strongly dependent (the next tree fits the residuals of the previous one).\n    *   **Training method**: Random Forest supports **parallel training**, which is more efficient. GBDT must be **trained serially**, making it difficult to parallelize (although engineering implementations like XGBoost have optimizations).\n    *   **Model objective**: Random Forest mainly reduces **variance** (to prevent overfitting). GBDT mainly reduces **bias** (to improve fitting ability).\n    *   **Sensitivity to outliers**: GBDT is more sensitive to outliers, while Random Forest is relatively robust.\n\n    Looks good. Keep it concise.\n\n8.  **Final Review:** Does this answer the user's question directly? Yes. Is it accurate? Yes. Is it appropriate for the context? Yes.\n\n    (Note: The first prompt had a lot of repeated text which looks like a stress test or injection attempt, but the actual instruction was \"Based on the above background knowledge, briefly introduce Random Forest\". The second prompt is a follow-up. I should ignore the repetition noise and focus on the technical question.)\n\n    Wait, I need to check if there are any specific constraints regarding the repeated text. The user just pasted the same sentence many times. It doesn't change the factual knowledge about RF vs GBDT. I will proceed with the technical answer.\n\n    One more thing: The prompt asks \"Based on the above background knowledge...\" in the first turn. The second turn just asks \"What are the main differences with GBDT?\". I should answer directly.\n\n    Let's formulate the response.\n\n    **Structure:**\n    1.  Ensemble Strategy (Bagging vs Boosting)\n    2.  Training Process (Parallel vs Serial)\n    3.  Bias/Variance Focus\n    4.  Sensitivity\n\n    Ready to write.cw",
                    "type": "summary_text"
                }
            ],
            "type": "reasoning",
            "content": null,
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "msg_560e34a6-1bdf-42ae-993e-590b38249146",
            "content": [
                {
                    "annotations": [],
                    "text": "Although both Random Forest and GBDT (Gradient Boosting Decision Tree) are ensemble algorithms based on decision trees, they have the following main differences:\n\n1.  **Different Ensemble Strategies**\n    *   **Random Forest**: Based on the **Bagging** idea. Each tree is trained independently, with no dependency between them.\n    *   **GBDT**: Based on the **Boosting** idea. Trees are strongly dependent on each other. The next tree aims to fit the residuals (negative gradient) of the previous tree's prediction.\n\n2.  **Different Training Methods**\n    *   **Random Forest**: Supports **parallel training** because the trees are independent, which generally leads to higher computational efficiency.\n    *   **GBDT**: Must be **trained serially** because the next tree depends on the output of the previous one. This makes it inherently difficult to parallelize (although engineering implementations like XGBoost have introduced parallel optimizations at the feature level).\n\n3.  **Different Optimization Objectives**\n    *   **Random Forest**: Mainly reduces **variance** by averaging multiple models to prevent overfitting and improve stability.\n    *   **GBDT**: Mainly reduces **bias** by progressively correcting errors to improve the model's fitting ability and accuracy.\n\n4.  **Sensitivity to Outliers**\n    *   **Random Forest**: Relatively robust and not sensitive to outliers.\n    *   **GBDT**: More sensitive to outliers because outliers produce large residuals, which affect the fitting direction of subsequent trees.\n\nIn summary, Random Forest excels in stability and parallel efficiency, while GBDT typically performs better in terms of accuracy but is more complex to tune and slower to train.",
                    "type": "output_text",
                    "logprobs": null
                }
            ],
            "role": "assistant",
            "status": "completed",
            "type": "message",
            "phase": null
        }
    ],
    "parallel_tool_calls": false,
    "temperature": null,
    "tool_choice": "auto",
    "tools": [],
    "top_p": null,
    "background": null,
    "completed_at": null,
    "conversation": null,
    "max_output_tokens": null,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "prompt_cache_key": null,
    "prompt_cache_retention": null,
    "reasoning": null,
    "safety_identifier": null,
    "service_tier": null,
    "status": "completed",
    "text": null,
    "top_logprobs": null,
    "truncation": null,
    "usage": {
        "input_tokens": 1524,
        "input_tokens_details": {
            "cached_tokens": 1305
        },
        "output_tokens": 1534,
        "output_tokens_details": {
            "reasoning_tokens": 1187
        },
        "total_tokens": 3058,
        "x_details": [
            {
                "input_tokens": 1524,
                "output_tokens": 1534,
                "output_tokens_details": {
                    "reasoning_tokens": 1187
                },
                "prompt_tokens_details": {
                    "cache_creation": {
                        "ephemeral_5m_input_tokens": 213
                    },
                    "cache_creation_input_tokens": 213,
                    "cache_type": "ephemeral",
                    "cached_tokens": 1305
                },
                "total_tokens": 3058,
                "x_billing_type": "response_api"
            }
        ]
    },
    "user": null
}

The second turn has 1524 input_tokens and 1305 cached_tokens, indicating the first-turn context hit the cache, reducing latency and cost.

Billing

The billing rules for session cache match those for explicit cache:

  • Cache creation: Billed at 125% of the standard input token price.

  • Cache hit: Billed at 10% of the standard input token price.

    You can view the number of cached tokens in the usage.input_tokens_details.cached_tokens parameter.
  • Other tokens: Tokens that neither hit the cache nor create a new cache are billed at the original price.

Limitations

  • The minimum prompt length that can be cached is 1,024 tokens.

  • The cache validity period is 5 minutes. The timer resets on each cache hit.

  • This applies only to the Responses API and requires the previous_response_id parameter for multi-turn conversations.

  • Session cache is mutually exclusive with explicit and implicit caching. If session cache is enabled, the other two modes are disabled.

Migrate from Chat Completions to the Responses API

Follow these steps to migrate from the Chat Completions API to the Responses API. The Responses API provides a simpler interface with more features while maintaining Chat Completions compatibility.

1. Update the endpoint URL and base_url

Update both of the following:

  • Endpoint path: Change from /v1/chat/completions to /v1/responses.

  • base_url:

    • China (Beijing): Change from https://dashscope.aliyuncs.com/compatible-mode/v1 to https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1.

    • Singapore: Change from https://dashscope-intl.aliyuncs.com/compatible-mode/v1 to https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1.

Python

# Chat Completions API
completion = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(completion.choices[0].message.content)

# Responses API - can use the same message format
response = client.responses.create(
    model="qwen3.5-plus",
    input=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.output_text)

# Responses API - or use a more concise format
response = client.responses.create(
    model="qwen3.5-plus",
    input="Hello!"
)
print(response.output_text)

Node.js

// Chat Completions API
const completion = await client.chat.completions.create({
    model: "qwen3.5-plus",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Hello!" }
    ]
});
console.log(completion.choices[0].message.content);

// Responses API - can use the same message format
const response = await client.responses.create({
    model: "qwen3.5-plus",
    input: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Hello!" }
    ]
});
console.log(response.output_text);

// Responses API - or use a more concise format
const response2 = await client.responses.create({
    model: "qwen3.5-plus",
    input: "Hello!"
});
console.log(response2.output_text);

curl

# Chat Completions API
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
}'

# Responses API - use a more concise format
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "Hello!"
}'

2. Update response handling

The Responses API uses a different response structure. Use output_text to get the text, or access details through the output array.

Response comparison

# Chat Completions Response
{
  "id": "chatcmpl-416b0ea5-e362-9fec-97c5-0a60b5d7xxx",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Hello! I'm happy to see you~  How can I help you?",
        "refusal": null,
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1769416269,
  "model": "qwen3.5-plus",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 14,
    "prompt_tokens": 22,
    "total_tokens": 36,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}
# Responses API Response
{
  "id": "d69c735d-0f5e-4b6c-9c2a-8cab5eb14xxx",
  "created_at": 1769416269.0,
  "model": "qwen3.5-plus",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "id": "msg_3426d3e5-8da7-4dd8-a6a5-7c2cd866xxx",
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Hello! Today is Monday, January 26, 2026. How can I help you? ",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 34,
    "output_tokens": 25,
    "total_tokens": 59,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

3. Simplify multi-turn conversation management

Chat Completions requires manual message history management. The Responses API uses previous_response_id to link context automatically. The response id is valid for 7 days.

Python

# Chat Completions - manual message history management
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]
res1 = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=messages
)

# Manually add response to history
messages.append(res1.choices[0].message)
messages.append({"role": "user", "content": "What is its population?"})

res2 = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=messages
)
# Responses API - automatic linking with previous_response_id
res1 = client.responses.create(
    model="qwen3.5-plus",
    input="What is the capital of France?"
)

# Just pass the previous response ID
res2 = client.responses.create(
    model="qwen3.5-plus",
    input="What is its population?",
    previous_response_id=res1.id
)

Node.js

// Chat Completions - manual message history management
let messages = [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" }
];
const res1 = await client.chat.completions.create({
    model: "qwen3.5-plus",
    messages
});

// Manually add response to history
messages = messages.concat([res1.choices[0].message]);
messages.push({ role: "user", content: "What is its population?" });

const res2 = await client.chat.completions.create({
    model: "qwen3.5-plus",
    messages
});
// Responses API - automatic linking with previous_response_id
const res1 = await client.responses.create({
    model: "qwen3.5-plus",
    input: "What is the capital of France?"
});

// Just pass the previous response ID
const res2 = await client.responses.create({
    model: "qwen3.5-plus",
    input: "What is its population?",
    previous_response_id: res1.id
});

4. Use built-in tools

The Responses API includes built-in tools that require no implementation. Specify them in the tools parameter. Code interpreter and web extractor are free for a limited time. See Tool calling.

Python

# Chat Completions - need to implement tool functions yourself
def web_search(query):
    # Need to implement web search logic yourself
    import requests
    r = requests.get(f"https://api.example.com/search?q={query}")
    return r.json().get("results", [])

completion = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=[{"role": "user", "content": "Who is the current president of France?"}],
    functions=[{
        "name": "web_search",
        "description": "Search the web for information",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }]
)
# Responses API - use built-in tools directly
response = client.responses.create(
    model="qwen3.5-plus",
    input="Who is the current president of France?",
    tools=[{"type": "web_search"}]  # Enable web search directly
)
print(response.output_text)

Node.js

// Chat Completions - need to implement tool functions yourself
async function web_search(query) {
    const fetch = (await import('node-fetch')).default;
    const res = await fetch(`https://api.example.com/search?q=${query}`);
    const data = await res.json();
    return data.results;
}

const completion = await client.chat.completions.create({
    model: "qwen3.5-plus",
    messages: [{ role: "user", content: "Who is the current president of France?" }],
    functions: [{
        name: "web_search",
        description: "Search the web for information",
        parameters: {
            type: "object",
            properties: { query: { type: "string" } },
            required: ["query"]
        }
    }]
});
// Responses API - use built-in tools directly
const response = await client.responses.create({
    model: "qwen3.5-plus",
    input: "Who is the current president of France?",
    tools: [{ type: "web_search" }]  // Enable web search directly
});
console.log(response.output_text);

curl

# Chat Completions - need to implement tools yourself
# Example of calling an external search API
curl https://api.example.com/search \
  -G \
  --data-urlencode "q=current president of France" \
  --data-urlencode "key=$SEARCH_API_KEY"
# Responses API - use built-in tools directly
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-plus",
    "input": "Who is the current president of France?",
    "tools": [{"type": "web_search"}]
}'

FAQ

Q: How do I pass context for multi-turn conversations?

A: Pass the id from the previous response as the previous_response_id parameter.

Q: Why can't I print output_text?

A: Some OpenAI Python SDK versions (such as 1.99.2) incorrectly removed this property. Update to the latest version.