Model Studio’s Qwen models support the OpenAI-compatible Responses API. As an evolution of the Chat Completions API, the Responses API delivers native agent capabilities in a concise and intuitive way.
Advantages over the OpenAI Chat Completions API:
Built-in tools: Includes built-in tools such as web search, web extractor, code interpreter, text-to-image search, and image-to-image search. These tools improve results for complex tasks. For more information, see Call built-in tools.
More flexible input: Accepts either a plain string or an array of messages in Chat format as model input.
Simplified context management: Pass the
previous_response_idfrom the previous response instead of manually constructing a full message history array.
For more information about the input and output parameters, see the OpenAI Responses API reference.
Prerequisites
First, get an API key and set the API key as an environment variable. If you use the OpenAI SDK, install it first.
Supported models
qwen3-max, qwen3-max-2026-01-23, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash.
Service endpoints
Singapore
base_url for SDK: https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1
HTTP endpoint: POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses
China (Beijing)
base_url for SDK: https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1
HTTP endpoint: POST https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses
Code examples
Basic call
Send a message and retrieve the model’s reply.
Python
import os
from openai import OpenAI
client = OpenAI(
# If environment variable is not set, replace with: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.5-plus",
input="What can you do?"
)
# Get model response
# print(response.model_dump_json())
print(response.output_text)Node.js
import OpenAI from "openai";
const openai = new OpenAI({
// If environment variable is not set, replace with: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.5-plus",
input: "What can you do?"
});
// Get model response
console.log(response.output_text);
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "What can you do?"
}'Response example
The API returns the following complete response.
{
"created_at": 1771226624,
"id": "bf0d5c2e-f14b-9ad7-bc0d-ee0c8c9ee2d8",
"model": "qwen3-max-2026-01-23",
"object": "response",
"output": [
{
"content": [
{
"annotations": [],
"text": "Hi there! I'm actually quite ......",
"type": "output_text"
}
],
"id": "msg_1e17fdb2-5fc3-4c78-a9e9-cbd78eb043f0",
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": false,
"status": "completed",
"tool_choice": "auto",
"tools": [],
"usage": {
"input_tokens": 37,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 220,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 257,
"x_details": [
{
"input_tokens": 37,
"output_tokens": 220,
"total_tokens": 257,
"x_billing_type": "response_api"
}
]
}
}Multi-turn conversation
Use previous_response_id to link context instead of manually building the message history. The response id is valid for 7 days.
idfrom the previous response (for example,f0dbb153-117f-9bbf-8176-5284b47f3xxxin UUID format) as the value ofprevious_response_id. Do not use theidof a message within theoutputarray (for example,msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx).
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
# First round
response1 = client.responses.create(
model="qwen3.5-plus",
input="My name is John, please remember it."
)
print(f"First response: {response1.output_text}")
# Second round - use previous_response_id to link context
# The response id expires in 7 days
response2 = client.responses.create(
model="qwen3.5-plus",
input="Do you remember my name?",
previous_response_id=response1.id
)
print(f"Second response: {response2.output_text}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
// First round
const response1 = await openai.responses.create({
model: "qwen3.5-plus",
input: "My name is John, please remember it."
});
console.log(`First response: ${response1.output_text}`);
// Second round - use previous_response_id to link context
// The response id expires in 7 days
const response2 = await openai.responses.create({
model: "qwen3.5-plus",
input: "Do you remember my name?",
previous_response_id: response1.id
});
console.log(`Second response: ${response2.output_text}`);
}
main();curl
# First round
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "My name is John, please remember it."
}'
# Second round - use the id from first response as previous_response_id
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Do you remember my name?",
"previous_response_id": "response_id_from_first_round"
}'Second-turn response example
{
"id": "f0dbb153-117f-9bbf-8176-5284b47f3xxx",
"created_at": 1769173209.0,
"model": "qwen3.5-plus",
"object": "response",
"status": "completed",
"output": [
{
"id": "msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx",
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Yes, John! I remember your name. How can I assist you today?",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 78,
"output_tokens": 16,
"total_tokens": 94,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
}
}Note: The input_tokens for the second turn is 78, which includes the context from the first turn. The model successfully remembered the name “John”.
Streaming output
Streaming output returns model-generated content in real time, which suits long text generation.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
stream = client.responses.create(
model="qwen3.5-plus",
input="Please briefly introduce artificial intelligence.",
stream=True
)
print("Receiving stream output:")
for event in stream:
# print(event.model_dump_json()) # Uncomment to see raw event response
if event.type == 'response.output_text.delta':
print(event.delta, end='', flush=True)
elif event.type == 'response.completed':
print("\nStream completed")
print(f"Total tokens: {event.response.usage.total_tokens}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const stream = await openai.responses.create({
model: "qwen3.5-plus",
input: "Please briefly introduce artificial intelligence.",
stream: true
});
console.log("Receiving stream output:");
for await (const event of stream) {
// console.log(JSON.stringify(event)); // Uncomment to see raw event response
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
} else if (event.type === 'response.completed') {
console.log("\nStream completed");
console.log(`Total tokens: ${event.response.usage.total_tokens}`);
}
}
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Please briefly introduce artificial intelligence.",
"stream": true
}'Response example
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"queued","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":0,"type":"response.created"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":1,"type":"response.in_progress"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[],"role":"assistant","status":"in_progress","type":"message"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"","type":"output_text","logprobs":null},"sequence_number":3,"type":"response.content_part.added"}
{"content_index":0,"delta":"Artificial intelligence","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":4,"type":"response.output_text.delta"}
{"content_index":0,"delta":" (Art","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":5,"type":"response.output_text.delta"}
{"content_index":0,"delta":"ificial Intelligence, ","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":6,"type":"response.output_text.delta"}
{"content_index":0,"delta":"or AI)","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":7,"type":"response.output_text.delta"}
... (intermediate events omitted) ...
{"content_index":0,"delta":" fields, and is profoundly changing our","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":38,"type":"response.output_text.delta"}
{"content_index":0,"delta":" lives and ways of work","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":39,"type":"response.output_text.delta"}
{"content_index":0,"delta":".","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":40,"type":"response.output_text.delta"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":41,"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior with computer systems. xxxx","type":"response.output_text.done"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior with computer systems. xxx","type":"output_text","logprobs":null},"sequence_number":42,"type":"response.content_part.done"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior with computer systems. It aims to enable machines to perform tasks that typically require human intelligence, such as:\n\n- **Learning** (for example, training models with data) \n- **Reasoning** (for example, logical judgment and problem-solving) \n- **Perception** (for example, recognizing images, speech, or text) \n- **Understanding language** (for example, natural language processing) \n- **Decision-making** (for example, making optimal choices in complex environments)\n\nArtificial intelligence can be divided into **weak AI** (focusing on specific tasks, such as voice assistants and recommendation systems) and **strong AI** (possessing general intelligence similar to humans, which has not yet been achieved).\n\nCurrently, AI has been widely applied in many fields, such as healthcare, finance, transportation, education, and entertainment, and is profoundly changing our lives and ways of work.","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"},"output_index":0,"sequence_number":43,"type":"response.output_item.done"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"qwen3.5-plus","object":"response","output":[{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"Artificial intelligence (AI) is xxxxxx","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"completed","text":null,"top_logprobs":null,"truncation":null,"usage":{"input_tokens":37,"input_tokens_details":{"cached_tokens":0},"output_tokens":166,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":203},"user":null},"sequence_number":44,"type":"response.completed"}Deep thinking
With deep thinking mode enabled, the model reasons before replying. The thinking process appears in an output item of type reasoning, suitable for complex reasoning tasks.
The thinking_budget parameter for controlling the maximum thinking length is not supported.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.5-plus",
input="Which is larger, 9.9 or 9.11?",
extra_body={"enable_thinking": True}
)
# Process the output
for item in response.output:
if item.type == "reasoning":
print("=== Thinking Process ===")
for summary in item.summary:
print(summary.text)
elif item.type == "message":
print("\n=== Final Answer ===")
print(item.content[0].text)
# View the number of thinking tokens
print(f"\nThinking Tokens: {response.usage.output_tokens_details.reasoning_tokens}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.5-plus",
input: "Which is larger, 9.9 or 9.11?",
enable_thinking: true
});
for (const item of response.output) {
if (item.type === "reasoning") {
console.log("=== Thinking Process ===");
for (const summary of item.summary) {
console.log(summary.text);
}
} else if (item.type === "message") {
console.log("\n=== Final Answer ===");
console.log(item.content[0].text);
}
}
// View the number of thinking tokens
console.log(`\nThinking Tokens: ${response.usage.output_tokens_details.reasoning_tokens}`);
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Which is larger, 9.9 or 9.11?",
"enable_thinking": true
}'Response example
{
"created_at": 1774498317,
"id": "resp_xxx",
"model": "qwen3.5-plus",
"object": "response",
"output": [
{
"id": "msg_xxx",
"summary": [
{
"text": "Thinking Process:\n\n1. **Analyze the Request:**\n * Question: \"Which is larger, 9.9 or 9.11?\"\n * Context: The user is asking a simple mathematical comparison question.\n * Current Date: Thursday, March 26, 2026 (provided in the system prompt).\n * Knowledge Cutoff: 2026 (provided in the system prompt).\n\n2. **Evaluate the Numbers:**\n * Number A: 9.9\n * Number B: 9.11\n * These are decimal numbers.\n * Compare the integer part: Both are 9.\n * Compare the tenths place (first decimal digit):\n * 9.9 has 9 in the tenths place.\n * 9.11 has 1 in the tenths place.\n * Since 9 > 1, 9.9 is greater than 9.11.\n\n3. **Consider Potential Ambiguities:**\n * Could this be version numbering? (e.g., software versions). In versioning, 9.11 is often \"newer\" or \"higher\" than 9.9. However, mathematically, 9.9 > 9.11.\n * Could this be dates? (September 9th vs September 11th). 11th is later.\n * Standard interpretation: Without context, decimal numbers are assumed to be mathematical values.\n * Common pitfall: Some people mistakenly treat decimals like whole numbers (where 11 > 9), leading them to think 9.11 > 9.9. This is a known cognitive bias or misconception in elementary math.\n * Decision: Provide the mathematical answer clearly, but perhaps acknowledge the versioning context if relevant (though usually, for this specific question, it's a math test). Given the simplicity, stick to the mathematical truth first.\n\n4. **Formulate the Answer:**\n * Direct answer: 9.9 is larger.\n * Explanation: Compare place values. 9.9 = 9.90, 9.11 = 9.11. 90 hundredths > 11 hundredths.\n * Tone: Helpful, clear.\n\n5. **Drafting the Response (Internal Monologue/Draft):**\n * Okay, mathematically 9.9 is bigger. 9.9 is 9 and 9 tenths. 9.11 is 9 and 11 hundredths (or 1 tenth and 1 hundredth). 9 tenths is more than 1 tenth.\n * So, 9.9 > 9.11.\n * I should answer in Chinese since the question is in Chinese.\n * \"9.9 is larger.\"\n * Add a brief explanation to avoid confusion. \"Because 9.9 equals 9.90, and 9.90 is greater than 9.11.\"\n\n6. **Refining based on \"Version Number\" possibility:**\n * Sometimes this is a trick question about software versions. In semver, 9.11 > 9.9.\n * However, usually, when asked plainly \"which is larger\", it refers to numerical value.\n * I will provide the numerical answer as primary, maybe add a note about versions if it feels necessary. But keeping it simple is better. Most likely, this is testing the decimal comparison logic.\n * Let's stick to the mathematical value as the default.\n\n7. **Final Polish:**\n * Answer: 9.9 is larger.\n * Reasoning: When comparing decimals, first look at the integer part, then the tenths place. The tenths digit of 9.9 is 9, and the tenths digit of 9.11 is 1. Since 9 > 1, 9.9 > 9.11.\n\n8. **Output Generation:** (Matching the thought process)\n * State clearly which is larger.\n * Explain why.\n\n *Self-Correction on Date/Time:* The system prompt mentions the current date is 2026. This doesn't affect the math question, but I should ensure I don't reference anything post-2026 if I were adding context (not needed here).\n\n *Final Decision:* Just answer the math question directly.\n\n \"Numerically, 9.9 is larger.\"\n Explain: 9.9 = 9.90, 9.11 = 9.11. 90 > 11.\n\n Wait, is there any other interpretation?\n - Dates? 9/9 vs 9/11. 11th is later.\n - Versions? 9.11 is newer.\n - But \"谁大\" (which means \"who is big/larger\") usually implies magnitude.\n - I will answer based on numerical magnitude but briefly mention the version context if it adds value without confusing. Actually, better to be decisive. Numerical value is the standard interpretation for \"who is larger\" with numbers.\n\n Let's go with the numerical answer.\n\n **Plan:**\n 1. State 9.9 is larger numerically.\n 2. Explain the decimal comparison.\n 3. (Optional but helpful) Mention that if it's version numbers, 9.11 might be considered \"higher\". But prioritize the math.\n Actually, for a simple query like this, over-explaining versions might be confusing. I'll stick to the math but acknowledge the common confusion.\n\n *Revised Plan:*\n 1. Direct answer: 9.9 is larger (mathematically).\n 2. Explanation: Place value.\n 3. Note: Unless it's a version number.\n\n Let's keep it concise.\n\n \"In terms of mathematical value, 9.9 is larger.\"\n \"Because 9.9 is equal to 9.90, which is larger than 9.11.\"\n \"If it's a version number, then version 9.11 is usually higher than version 9.9.\"\n This covers both bases.\n\n Wait, looking at common AI training, this is often a \"trap\" question to see if the AI treats decimals like integers. The correct mathematical answer is crucial.\n\n Okay, final output structure:\n 1. Mathematical answer.\n 2. Brief explanation.\n 3. Contextual note (versions).\n\n Let's write it.\n \"In terms of mathematical value, 9.9 is larger.\"\n \"When comparing decimals, first look at the integer part (both are 9), then the tenths place: the tenths digit of 9.9 is 9, while the tenths digit of 9.11 is 1. Because 9 > 1, 9.9 > 9.11.\"\n \"However, if these are software version numbers, version 9.11 would typically be newer than version 9.9.\"\n\n This is balanced.\n\n Wait, I need to check the instruction again. \"Please remember the current actual time...\". This is just system context. I don't need to mention the date in the answer unless relevant. It's not relevant here.\n\n Okay, ready to generate.\n\n Wait, one more check. Is there any cultural context? In some contexts, people might say 9.11 is bigger because 11 > 9. This is the misconception I need to correct.\n So, emphasizing the mathematical rule is key.\n\n Final check:\n 9.9 = 9 + 9/10 = 9.90\n 9.11 = 9 + 1/10 + 1/100 = 9.11\n 9.90 > 9.11. Correct.\n\n The response will be in Chinese.\n \"In terms of numerical value, 9.9 is larger.\"\n Explain why.\n Mention version exception.\n Done.\n",
"type": "summary_text"
}
],
"type": "reasoning"
},
{
"content": [
{
"annotations": [],
"text": "In terms of **mathematical value**, **9.9 is larger**.\n\nHere's why:\nWhen comparing decimals, you first look at the integer part, which is 9 for both. Then you look at the tenths place (the first digit after the decimal point):\n* The tenths digit of 9.9 is **9**\n* The tenths digit of 9.11 is **1**\n\nBecause 9 is greater than 1, **9.9 > 9.11** (you can think of 9.9 as 9.90 for comparison).\n\n**Note**: If these were **software version numbers**, version 9.11 would typically be considered newer (higher) than version 9.9. But in a purely numerical comparison, 9.9 is larger.",
"type": "output_text"
}
],
"id": "msg_xxx",
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": false,
"status": "completed",
"tool_choice": "auto",
"tools": [],
"usage": {
"input_tokens": 57,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2018,
"output_tokens_details": {
"reasoning_tokens": 1861
},
"total_tokens": 2075,
"x_details": [
{
"input_tokens": 57,
"output_tokens": 2018,
"output_tokens_details": {
"reasoning_tokens": 1861
},
"total_tokens": 2075,
"x_billing_type": "response_api"
}
]
}
}Call built-in tools
Enable built-in tools to improve results for complex tasks. Web extractor and code interpreter are free for a limited time. For supported tools, see Tool calling.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.5-plus",
input="Find the Alibaba Cloud website and extract key information",
# For best results, enable all the built-in tools
tools=[
{"type": "web_search"},
{"type": "code_interpreter"},
{"type": "web_extractor"}
],
extra_body={"enable_thinking": True}
)
# Uncomment the line below to see the intermediate output
# print(response.output)
print(response.output_text)
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.5-plus",
input: "Find the Alibaba Cloud website and extract key information",
tools: [
{ type: "web_search" },
{ type: "code_interpreter" },
{ type: "web_extractor" }
],
enable_thinking: true
});
for (const item of response.output) {
if (item.type === "reasoning") {
console.log("Model is thinking...");
} else if (item.type === "web_search_call") {
console.log(`Search query: ${item.action.query}`);
} else if (item.type === "web_extractor_call") {
console.log("Extracting web content...");
} else if (item.type === "message") {
console.log(`Response: ${item.content[0].text}`);
}
}
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Find the Alibaba Cloud website and extract key information",
"tools": [
{
"type": "web_search"
},
{
"type": "code_interpreter"
},
{
"type": "web_extractor"
}
],
"enable_thinking": true
}'Response example
{
"id": "69258b21-5099-9d09-92e8-8492b1955xxx",
"object": "response",
"status": "completed",
"output": [
{
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "The user wants to find the Alibaba Cloud official website and extract information..."
}
]
},
{
"type": "web_search_call",
"status": "completed",
"action": {
"query": "Alibaba Cloud official website",
"type": "search",
"sources": [
{
"type": "url",
"url": "https://cn.aliyun.com/"
},
{
"type": "url",
"url": "https://www.alibabacloud.com/zh"
}
]
}
},
{
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "The search results show the URL of the Alibaba Cloud official website..."
}
]
},
{
"type": "web_extractor_call",
"status": "completed",
"goal": "Extract key information from the homepage of the Alibaba Cloud official website",
"output": "Qwen large language model, complete product system, AI solutions...",
"urls": [
"https://cn.aliyun.com/"
]
},
{
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Key information from the Alibaba Cloud official website: Qwen large language model, cloud computing services..."
}
]
}
],
"usage": {
"input_tokens": 40836,
"output_tokens": 2106,
"total_tokens": 42942,
"output_tokens_details": {
"reasoning_tokens": 677
},
"x_tools": {
"web_extractor": {
"count": 1
},
"web_search": {
"count": 1
}
}
}
}Session cache
Overview
Session cache is a server-side cache mode for multi-turn conversations in the Responses API. Unlike explicit caching, which requires you to add the cache_control flag manually, session cache handles caching logic automatically. Enable or disable it with an HTTP header, then make calls as in a normal multi-turn conversation.
When using previous_response_id for multi-turn conversations, session cache lets the server cache the conversation context automatically, reducing inference latency and costs.Usage
Add one of the following fields to the request header to enable or disable session cache:
x-dashscope-session-cache: enable: Enables session cache.x-dashscope-session-cache: disable: Disables session cache. If the model supports it, implicit caching will be enabled instead.
Pass this header through default_headers (Python) or defaultHeaders (Node.js). With curl, use -H.
Supported models
qwen3-max, qwen3.5-plus, qwen3.5-flash, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash
Session cache applies only to the Responses API (OpenAI compatible - Responses) and not to the Chat Completions API.
Code examples
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
# Enable session cache through default_headers
default_headers={"x-dashscope-session-cache": "enable"}
)
# Construct a long text of over 1,024 tokens to ensure cache creation. If the text is less than 1,024 tokens, the cache is created when the accumulated conversation context exceeds 1,024 tokens.
long_context = "Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence." * 50
# First turn
response1 = client.responses.create(
model="qwen3.5-plus",
input=long_context + "\n\nBased on the background knowledge above, please briefly introduce the random forest algorithm in machine learning.",
)
print(f"First reply: {response1.output_text}")
# Second turn: Link context using previous_response_id. The cache is handled automatically by the server.
response2 = client.responses.create(
model="qwen3.5-plus",
input="What are the main differences between it and GBDT?",
previous_response_id=response1.id,
)
print(f"Second reply: {response2.output_text}")
# Check the cache hit status
usage = response2.usage
print(f"Input Tokens: {usage.input_tokens}")
print(f"Cached Tokens: {usage.input_tokens_details.cached_tokens}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
// Enable session cache through defaultHeaders
defaultHeaders: {"x-dashscope-session-cache": "enable"}
});
// Construct a long text of over 1,024 tokens to ensure cache creation. If the text is less than 1,024 tokens, the cache is created when the accumulated conversation context exceeds 1,024 tokens.
const longContext = "Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence.".repeat(50);
async function main() {
// First turn
const response1 = await openai.responses.create({
model: "qwen3.5-plus",
input: longContext + "\n\nBased on the background knowledge above, please briefly introduce the random forest algorithm in machine learning, including its basic principles and application scenarios."
});
console.log(`First reply: ${response1.output_text}`);
// Second turn: Link context using previous_response_id. The cache is handled automatically by the server.
const response2 = await openai.responses.create({
model: "qwen3.5-plus",
input: "What are the main differences between it and GBDT?",
previous_response_id: response1.id
});
console.log(`Second reply: ${response2.output_text}`);
// Check the cache hit status
console.log(`Input Tokens: ${response2.usage.input_tokens}`);
console.log(`Cached Tokens: ${response2.usage.input_tokens_details.cached_tokens}`);
}
main();curl
# First turn
# Replace the input with a long text of over 1,024 tokens to ensure cache creation.
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
"model": "qwen3.5-plus",
"input": "Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science, dedicated to researching and developing theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence.\n\nBased on the background knowledge above, please briefly introduce the random forest algorithm in machine learning, including its basic principles and application scenarios."
}'
# Second turn - Use the ID returned in the first turn as the previous_response_id
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
"model": "qwen3.5-plus",
"input": "What are the main differences between it and GBDT?",
"previous_response_id": "response_id_from_first_turn"
}'Second-turn response example (cache hit)
In the second-turn response, usage.input_tokens_details.cached_tokens indicates the number of tokens that hit the cache.
{
"id": "145584fd-3dce-4890-99dc-e3896d7f5a42",
"created_at": 1772440976.0,
"error": null,
"incomplete_details": null,
"instructions": null,
"metadata": null,
"model": "qwen3.5-plus",
"object": "response",
"output": [
{
"id": "msg_62a4e323-d78c-46c7-8469-2ad50f8af4b1",
"summary": [
{
"text": "Thinking Process:\n\n1. **Analyze the Request:**\n * User asks for the main differences between Random Forest (RF) and GBDT (Gradient Boosting Decision Tree).\n * Context: The previous turn provided a definition of AI (repeated many times) and then a brief introduction to Random Forest.\n * Constraint: Keep it concise (implied by \"briefly introduce\" in the first turn, though this turn asks for \"main differences\"). The tone should be informative and professional.\n * Current Date: Monday, March 02, 2026.\n * Knowledge Cutoff: 2026.\n\n2. **Recall Knowledge about RF vs. GBDT:**\n * **Bagging vs. Boosting:** RF is Bagging (Parallel), GBDT is Boosting (Sequential).\n * **Bias/Variance:** RF reduces variance, GBDT reduces bias.\n * **Sample/Feature Selection:** RF uses bootstrap sampling (row sampling) and feature subsampling. GBDT usually uses all samples (weighted) and often all features (though variants like XGBoost/LightGBM introduce feature sampling).\n * **Tree Structure:** RF trees are usually deep and independent. GBDT trees are usually shallow (weak learners) and dependent (each corrects the previous).\n * **Sensitivity to Outliers:** GBDT is more sensitive.\n * **Parallelization:** RF is easily parallelized. GBDT is inherently sequential (though tree construction within a step can be parallelized).\n\n3. **Draft the Response:**\n * Structure: Core difference (Bagging vs. Boosting), Training method, Performance characteristics, Sensitivity.\n * Language: Chinese.\n * Tone: Professional, clear.\n\n4. **Refine the Content (Self-Correction/Optimization):**\n * Make sure to highlight the *ensemble strategy* (Bagging vs. Boosting) as the fundamental difference.\n * Mention parallel vs. sequential training.\n * Mention bias/variance trade-off.\n * Keep it brief but comprehensive enough for a \"main differences\" question.\n\n5. **Final Polish (incorporating constraints):**\n * Ensure no hallucination.\n * Ensure alignment with the 2026 knowledge cutoff (nothing has changed fundamentally in RF vs. GBDT theory).\n * Check against the repeated text in the first prompt (ignore the repetition, focus on the actual question).\n\n6. **Drafting the response (internal monologue):**\n * Start with the ensemble type: RF = Bagging, GBDT = Boosting.\n * Training: RF = Parallel, independent trees. GBDT = Sequential, dependent trees (fitting residuals).\n * Goal: RF = Reduce variance. GBDT = Reduce bias.\n * Sensitivity: GBDT more sensitive to outliers.\n * Summary table or bullet points for clarity.\n\n7. **Final Output Generation:** (Matches the drafted points).\n\n * **Ensemble method**: RF is Bagging, GBDT is Boosting.\n * **Training process**: RF is parallel, GBDT is serial.\n * **Focus**: RF reduces variance, GBDT reduces bias.\n * **Outliers**: GBDT is more sensitive.\n\n Let's write it out clearly.\n\n * **Core ensemble strategy**: Random Forest is based on the **Bagging** idea, where trees are independent of each other. GBDT is based on the **Boosting** idea, where trees are strongly dependent (the next tree fits the residuals of the previous one).\n * **Training method**: Random Forest supports **parallel training**, which is more efficient. GBDT must be **trained serially**, making it difficult to parallelize (although engineering implementations like XGBoost have optimizations).\n * **Model objective**: Random Forest mainly reduces **variance** (to prevent overfitting). GBDT mainly reduces **bias** (to improve fitting ability).\n * **Sensitivity to outliers**: GBDT is more sensitive to outliers, while Random Forest is relatively robust.\n\n Looks good. Keep it concise.\n\n8. **Final Review:** Does this answer the user's question directly? Yes. Is it accurate? Yes. Is it appropriate for the context? Yes.\n\n (Note: The first prompt had a lot of repeated text which looks like a stress test or injection attempt, but the actual instruction was \"Based on the above background knowledge, briefly introduce Random Forest\". The second prompt is a follow-up. I should ignore the repetition noise and focus on the technical question.)\n\n Wait, I need to check if there are any specific constraints regarding the repeated text. The user just pasted the same sentence many times. It doesn't change the factual knowledge about RF vs GBDT. I will proceed with the technical answer.\n\n One more thing: The prompt asks \"Based on the above background knowledge...\" in the first turn. The second turn just asks \"What are the main differences with GBDT?\". I should answer directly.\n\n Let's formulate the response.\n\n **Structure:**\n 1. Ensemble Strategy (Bagging vs Boosting)\n 2. Training Process (Parallel vs Serial)\n 3. Bias/Variance Focus\n 4. Sensitivity\n\n Ready to write.cw",
"type": "summary_text"
}
],
"type": "reasoning",
"content": null,
"encrypted_content": null,
"status": null
},
{
"id": "msg_560e34a6-1bdf-42ae-993e-590b38249146",
"content": [
{
"annotations": [],
"text": "Although both Random Forest and GBDT (Gradient Boosting Decision Tree) are ensemble algorithms based on decision trees, they have the following main differences:\n\n1. **Different Ensemble Strategies**\n * **Random Forest**: Based on the **Bagging** idea. Each tree is trained independently, with no dependency between them.\n * **GBDT**: Based on the **Boosting** idea. Trees are strongly dependent on each other. The next tree aims to fit the residuals (negative gradient) of the previous tree's prediction.\n\n2. **Different Training Methods**\n * **Random Forest**: Supports **parallel training** because the trees are independent, which generally leads to higher computational efficiency.\n * **GBDT**: Must be **trained serially** because the next tree depends on the output of the previous one. This makes it inherently difficult to parallelize (although engineering implementations like XGBoost have introduced parallel optimizations at the feature level).\n\n3. **Different Optimization Objectives**\n * **Random Forest**: Mainly reduces **variance** by averaging multiple models to prevent overfitting and improve stability.\n * **GBDT**: Mainly reduces **bias** by progressively correcting errors to improve the model's fitting ability and accuracy.\n\n4. **Sensitivity to Outliers**\n * **Random Forest**: Relatively robust and not sensitive to outliers.\n * **GBDT**: More sensitive to outliers because outliers produce large residuals, which affect the fitting direction of subsequent trees.\n\nIn summary, Random Forest excels in stability and parallel efficiency, while GBDT typically performs better in terms of accuracy but is more complex to tune and slower to train.",
"type": "output_text",
"logprobs": null
}
],
"role": "assistant",
"status": "completed",
"type": "message",
"phase": null
}
],
"parallel_tool_calls": false,
"temperature": null,
"tool_choice": "auto",
"tools": [],
"top_p": null,
"background": null,
"completed_at": null,
"conversation": null,
"max_output_tokens": null,
"max_tool_calls": null,
"previous_response_id": null,
"prompt": null,
"prompt_cache_key": null,
"prompt_cache_retention": null,
"reasoning": null,
"safety_identifier": null,
"service_tier": null,
"status": "completed",
"text": null,
"top_logprobs": null,
"truncation": null,
"usage": {
"input_tokens": 1524,
"input_tokens_details": {
"cached_tokens": 1305
},
"output_tokens": 1534,
"output_tokens_details": {
"reasoning_tokens": 1187
},
"total_tokens": 3058,
"x_details": [
{
"input_tokens": 1524,
"output_tokens": 1534,
"output_tokens_details": {
"reasoning_tokens": 1187
},
"prompt_tokens_details": {
"cache_creation": {
"ephemeral_5m_input_tokens": 213
},
"cache_creation_input_tokens": 213,
"cache_type": "ephemeral",
"cached_tokens": 1305
},
"total_tokens": 3058,
"x_billing_type": "response_api"
}
]
},
"user": null
}The second turn has 1524 input_tokens and 1305 cached_tokens, indicating the first-turn context hit the cache, reducing latency and cost.
Billing
The billing rules for session cache match those for explicit cache:
Cache creation: Billed at 125% of the standard input token price.
Cache hit: Billed at 10% of the standard input token price.
You can view the number of cached tokens in the
usage.input_tokens_details.cached_tokensparameter.Other tokens: Tokens that neither hit the cache nor create a new cache are billed at the original price.
Limitations
The minimum prompt length that can be cached is 1,024 tokens.
The cache validity period is 5 minutes. The timer resets on each cache hit.
This applies only to the Responses API and requires the
previous_response_idparameter for multi-turn conversations.Session cache is mutually exclusive with explicit and implicit caching. If session cache is enabled, the other two modes are disabled.
Migrate from Chat Completions to the Responses API
Follow these steps to migrate from the Chat Completions API to the Responses API. The Responses API provides a simpler interface with more features while maintaining Chat Completions compatibility.
1. Update the endpoint URL and base_url
Update both of the following:
Endpoint path: Change from
/v1/chat/completionsto/v1/responses.base_url:
China (Beijing): Change from
https://dashscope.aliyuncs.com/compatible-mode/v1tohttps://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1.Singapore: Change from
https://dashscope-intl.aliyuncs.com/compatible-mode/v1tohttps://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1.
Python
# Chat Completions API
completion = client.chat.completions.create(
model="qwen3.5-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message.content)
# Responses API - can use the same message format
response = client.responses.create(
model="qwen3.5-plus",
input=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.output_text)
# Responses API - or use a more concise format
response = client.responses.create(
model="qwen3.5-plus",
input="Hello!"
)
print(response.output_text)Node.js
// Chat Completions API
const completion = await client.chat.completions.create({
model: "qwen3.5-plus",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(completion.choices[0].message.content);
// Responses API - can use the same message format
const response = await client.responses.create({
model: "qwen3.5-plus",
input: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(response.output_text);
// Responses API - or use a more concise format
const response2 = await client.responses.create({
model: "qwen3.5-plus",
input: "Hello!"
});
console.log(response2.output_text);curl
# Chat Completions API
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
# Responses API - use a more concise format
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Hello!"
}'2. Update response handling
The Responses API uses a different response structure. Use output_text to get the text, or access details through the output array.
Response comparison
| |
3. Simplify multi-turn conversation management
Chat Completions requires manual message history management. The Responses API uses previous_response_id to link context automatically. The response id is valid for 7 days.
Python
| |
Node.js
| |
4. Use built-in tools
The Responses API includes built-in tools that require no implementation. Specify them in the tools parameter. Code interpreter and web extractor are free for a limited time. See Tool calling.
Python
| |
Node.js
| |
curl
| |
FAQ
Q: How do I pass context for multi-turn conversations?
A: Pass the id from the previous response as the previous_response_id parameter.
Q: Why can't I print output_text?
A: Some OpenAI Python SDK versions (such as 1.99.2) incorrectly removed this property. Update to the latest version.