阿里云百炼的通义千问模型支持 OpenAI 兼容 Responses 接口。作为Chat Completions API的演进版本,Responses API能够以更简洁的方式提供智能体原生功能。
相较于OpenAI Chat Completions API 的优势:
内置工具:内置联网搜索、网页抓取、代码解释器、文搜图、图搜图等工具,可在处理复杂任务时获得更佳效果,详情参考调用内置工具。
更灵活的输入:支持直接传入字符串作为模型输入,也兼容 Chat 格式的消息数组。
简化上下文管理:通过传递上一轮响应的
previous_response_id,无需手动构建完整的消息历史数组。
输入输出参数说明请参考OpenAI Responses API参考。
前提条件
您需要先获取API Key并配置API Key到环境变量。若通过 OpenAI SDK 进行调用,需要安装SDK。
支持的模型
qwen3-max、qwen3-max-2026-01-23、qwen3.5-plus、qwen3.5-plus-2026-02-15、qwen3.5-flash、qwen3.5-flash-2026-02-23、qwen3.5-397b-a17b、qwen3.5-122b-a10b、qwen3.5-27b、qwen3.5-35b-a3b、qwen-plus、qwen-flash、qwen3-coder-plus、qwen3-coder-flash。
服务地址
新加坡
SDK 调用配置的base_url:https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1
HTTP 请求地址:POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses
华北2(北京)
SDK 调用配置的base_url:https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1
HTTP 请求地址:POST https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses
代码示例
基础调用
最简单的调用方式,发送一条消息并获取模型回复。
Python
import os
from openai import OpenAI
client = OpenAI(
# If environment variable is not set, replace with: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.5-plus",
input="What can you do?"
)
# Get model response
# print(response.model_dump_json())
print(response.output_text)Node.js
import OpenAI from "openai";
const openai = new OpenAI({
// If environment variable is not set, replace with: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.5-plus",
input: "What can you do?"
});
// Get model response
console.log(response.output_text);
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "What can you do?"
}'响应示例
以下为API返回的完整响应。
{
"created_at": 1771226624,
"id": "bf0d5c2e-f14b-9ad7-bc0d-ee0c8c9ee2d8",
"model": "qwen3-max-2026-01-23",
"object": "response",
"output": [
{
"content": [
{
"annotations": [],
"text": "Hi there! I'm actually quite ......",
"type": "output_text"
}
],
"id": "msg_1e17fdb2-5fc3-4c78-a9e9-cbd78eb043f0",
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": false,
"status": "completed",
"tool_choice": "auto",
"tools": [],
"usage": {
"input_tokens": 37,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 220,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 257,
"x_details": [
{
"input_tokens": 37,
"output_tokens": 220,
"total_tokens": 257,
"x_billing_type": "response_api"
}
]
}
}多轮对话
通过 previous_response_id 参数自动关联上下文,无需手动构建消息历史,当前响应id有效期为7天。
previous_response_id应传入上一轮响应中的顶层id(f0dbb153-117f-9bbf-8176-5284b47f3xxx,UUID格式),而不是output数组内消息的id(msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx)。
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
# First round
response1 = client.responses.create(
model="qwen3.5-plus",
input="My name is John, please remember it."
)
print(f"First response: {response1.output_text}")
# Second round - use previous_response_id to link context
# The response id expires in 7 days
response2 = client.responses.create(
model="qwen3.5-plus",
input="Do you remember my name?",
previous_response_id=response1.id
)
print(f"Second response: {response2.output_text}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
// First round
const response1 = await openai.responses.create({
model: "qwen3.5-plus",
input: "My name is John, please remember it."
});
console.log(`First response: ${response1.output_text}`);
// Second round - use previous_response_id to link context
// The response id expires in 7 days
const response2 = await openai.responses.create({
model: "qwen3.5-plus",
input: "Do you remember my name?",
previous_response_id: response1.id
});
console.log(`Second response: ${response2.output_text}`);
}
main();curl
# First round
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "My name is John, please remember it."
}'
# Second round - use the id from first response as previous_response_id
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Do you remember my name?",
"previous_response_id": "response_id_from_first_round"
}'第二轮对话响应示例
{
"id": "f0dbb153-117f-9bbf-8176-5284b47f3xxx",
"created_at": 1769173209.0,
"model": "qwen3.5-plus",
"object": "response",
"status": "completed",
"output": [
{
"id": "msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx",
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Yes, John! I remember your name. How can I assist you today?",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 78,
"output_tokens": 16,
"total_tokens": 94,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
}
}说明:第二轮对话的 input_tokens 为 78,包含了第一轮的上下文,模型成功记住了名字"John"。
流式输出
通过流式输出实时接收模型生成的内容,适合长文本生成场景。
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
stream = client.responses.create(
model="qwen3.5-plus",
input="Please briefly introduce artificial intelligence.",
stream=True
)
print("Receiving stream output:")
for event in stream:
# print(event.model_dump_json()) # Uncomment to see raw event response
if event.type == 'response.output_text.delta':
print(event.delta, end='', flush=True)
elif event.type == 'response.completed':
print("\nStream completed")
print(f"Total tokens: {event.response.usage.total_tokens}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const stream = await openai.responses.create({
model: "qwen3.5-plus",
input: "Please briefly introduce artificial intelligence.",
stream: true
});
console.log("Receiving stream output:");
for await (const event of stream) {
// console.log(JSON.stringify(event)); // Uncomment to see raw event response
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
} else if (event.type === 'response.completed') {
console.log("\nStream completed");
console.log(`Total tokens: ${event.response.usage.total_tokens}`);
}
}
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Please briefly introduce artificial intelligence.",
"stream": true
}'响应示例
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"queued","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":0,"type":"response.created"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":1,"type":"response.in_progress"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[],"role":"assistant","status":"in_progress","type":"message"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"","type":"output_text","logprobs":null},"sequence_number":3,"type":"response.content_part.added"}
{"content_index":0,"delta":"人工智能","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":4,"type":"response.output_text.delta"}
{"content_index":0,"delta":"(Art","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":5,"type":"response.output_text.delta"}
{"content_index":0,"delta":"ificial Intelligence,","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":6,"type":"response.output_text.delta"}
{"content_index":0,"delta":"简称 AI)","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":7,"type":"response.output_text.delta"}
... (省略中间事件) ...
{"content_index":0,"delta":"领域,正在深刻改变我们的","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":38,"type":"response.output_text.delta"}
{"content_index":0,"delta":"生活和工作方式","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":39,"type":"response.output_text.delta"}
{"content_index":0,"delta":"。","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":40,"type":"response.output_text.delta"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":41,"text":"人工智能(Artificial Intelligence,简称 AI)是指由计算机系统模拟人类智能行为的技术和科学。xxxx","type":"response.output_text.done"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"人工智能(Artificial Intelligence,简称 AI)是指由计算机系统模拟人类智能行为的技术和科学。xxx","type":"output_text","logprobs":null},"sequence_number":42,"type":"response.content_part.done"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"人工智能(Artificial Intelligence,简称 AI)是指由计算机系统模拟人类智能行为的技术和科学。它旨在让机器能够执行通常需要人类智能才能完成的任务,例如:\n\n- **学习**(如通过数据训练模型) \n- **推理**(如逻辑判断和问题求解) \n- **感知**(如识别图像、语音或文字) \n- **理解语言**(如自然语言处理) \n- **决策**(如在复杂环境中做出最优选择)\n\n人工智能可分为**弱人工智能**(专注于特定任务,如语音助手、推荐系统)和**强人工智能**(具备类似人类的通用智能,目前尚未实现)。\n\n当前,AI 已广泛应用于医疗、金融、交通、教育、娱乐等多个领域,正在深刻改变我们的生活和工作方式。","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"},"output_index":0,"sequence_number":43,"type":"response.output_item.done"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"qwen3.5-plus","object":"response","output":[{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"人工智能(Artificial Intelligence,简称 AI)是xxxxxx","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"completed","text":null,"top_logprobs":null,"truncation":null,"usage":{"input_tokens":37,"input_tokens_details":{"cached_tokens":0},"output_tokens":166,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":203},"user":null},"sequence_number":44,"type":"response.completed"}深度思考
开启深度思考模式后,模型会在回复前进行思考,思考内容通过 reasoning 类型的输出项返回。适用于需要复杂推理的问题。
不支持 thinking_budget 参数控制最大思维长度。
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.5-plus",
input="9.9和9.11谁大?",
extra_body={"enable_thinking": True}
)
# 处理输出
for item in response.output:
if item.type == "reasoning":
print("=== 思考过程 ===")
for summary in item.summary:
print(summary.text)
elif item.type == "message":
print("\n=== 最终答案 ===")
print(item.content[0].text)
# 查看思考 Token 数
print(f"\n思考 Token 数: {response.usage.output_tokens_details.reasoning_tokens}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.5-plus",
input: "9.9和9.11谁大?",
enable_thinking: true
});
for (const item of response.output) {
if (item.type === "reasoning") {
console.log("=== 思考过程 ===");
for (const summary of item.summary) {
console.log(summary.text);
}
} else if (item.type === "message") {
console.log("\n=== 最终答案 ===");
console.log(item.content[0].text);
}
}
// 查看思考 Token 数
console.log(`\n思考 Token 数: ${response.usage.output_tokens_details.reasoning_tokens}`);
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "9.9和9.11谁大?",
"enable_thinking": true
}'响应示例
{
"created_at": 1774498317,
"id": "resp_xxx",
"model": "qwen3.5-plus",
"object": "response",
"output": [
{
"id": "msg_xxx",
"summary": [
{
"text": "Thinking Process:\n\n1. **Analyze the Request:**\n * Question: \"9.9 和 9.11 谁大?\" (Which is larger, 9.9 or 9.11?)\n * Context: The user is asking a simple mathematical comparison question.\n * Current Date: Thursday, March 26, 2026 (provided in the system prompt).\n * Knowledge Cutoff: 2026 (provided in the system prompt).\n\n2. **Evaluate the Numbers:**\n * Number A: 9.9\n * Number B: 9.11\n * These are decimal numbers.\n * Compare the integer part: Both are 9.\n * Compare the tenths place (first decimal digit):\n * 9.9 has 9 in the tenths place.\n * 9.11 has 1 in the tenths place.\n * Since 9 > 1, 9.9 is greater than 9.11.\n\n3. **Consider Potential Ambiguities:**\n * Could this be version numbering? (e.g., software versions). In versioning, 9.11 is often \"newer\" or \"higher\" than 9.9. However, mathematically, 9.9 > 9.11.\n * Could this be dates? (September 9th vs September 11th). 11th is later.\n * Standard interpretation: Without context, decimal numbers are assumed to be mathematical values.\n * Common pitfall: Some people mistakenly treat decimals like whole numbers (where 11 > 9), leading them to think 9.11 > 9.9. This is a known cognitive bias or misconception in elementary math.\n * Decision: Provide the mathematical answer clearly, but perhaps acknowledge the versioning context if relevant (though usually, for this specific question, it's a math test). Given the simplicity, stick to the mathematical truth first.\n\n4. **Formulate the Answer:**\n * Direct answer: 9.9 is larger.\n * Explanation: Compare place values. 9.9 = 9.90, 9.11 = 9.11. 90 hundredths > 11 hundredths.\n * Tone: Helpful, clear.\n\n5. **Drafting the Response (Internal Monologue/Draft):**\n * Okay, mathematically 9.9 is bigger. 9.9 is 9 and 9 tenths. 9.11 is 9 and 11 hundredths (or 1 tenth and 1 hundredth). 9 tenths is more than 1 tenth.\n * So, 9.9 > 9.11.\n * I should answer in Chinese since the question is in Chinese.\n * \"9.9 更大。\" (9.9 is larger.)\n * Add a brief explanation to avoid confusion. \"因为 9.9 等于 9.90,而 9.90 大于 9.11。\" (Because 9.9 equals 9.90, and 9.90 is greater than 9.11.)\n\n6. **Refining based on \"Version Number\" possibility:**\n * Sometimes this is a trick question about software versions. In semver, 9.11 > 9.9.\n * However, usually, when asked plainly \"which is larger\", it refers to numerical value.\n * I will provide the numerical answer as primary, maybe add a note about versions if it feels necessary. But keeping it simple is better. Most likely, this is testing the decimal comparison logic.\n * Let's stick to the mathematical value as the default.\n\n7. **Final Polish:**\n * Answer: 9.9 大。\n * Reasoning: 比较小数时,先看整数部分,再看十分位。9.9 的十分位是 9,9.11 的十分位是 1。9 > 1,所以 9.9 > 9.11。\n\n8. **Output Generation:** (Matching the thought process)\n * State clearly which is larger.\n * Explain why.\n\n *Self-Correction on Date/Time:* The system prompt mentions the current date is 2026. This doesn't affect the math question, but I should ensure I don't reference anything post-2026 if I were adding context (not needed here).\n\n *Final Decision:* Just answer the math question directly.\n\n \"从数值上看,9.9 更大。\" (Numerically, 9.9 is larger.)\n Explain: 9.9 = 9.90, 9.11 = 9.11. 90 > 11.\n\n Wait, is there any other interpretation?\n - Dates? 9/9 vs 9/11. 11th is later.\n - Versions? 9.11 is newer.\n - But \"谁大\" (who is big/larger) usually implies magnitude.\n - I will answer based on numerical magnitude but briefly mention the version context if it adds value without confusing. Actually, better to be decisive. Numerical value is the standard interpretation for \"who is larger\" with numbers.\n\n Let's go with the numerical answer.\n\n **Plan:**\n 1. State 9.9 is larger numerically.\n 2. Explain the decimal comparison.\n 3. (Optional but helpful) Mention that if it's version numbers, 9.11 might be considered \"higher\". But prioritize the math.\n Actually, for a simple query like this, over-explaining versions might be confusing. I'll stick to the math but acknowledge the common confusion.\n\n *Revised Plan:*\n 1. Direct answer: 9.9 is larger (mathematically).\n 2. Explanation: Place value.\n 3. Note: Unless it's a version number.\n\n Let's keep it concise.\n\n \"在数学数值上,9.9 更大。\"\n \"因为 9.9 等于 9.90,比 9.11 大。\"\n \"如果是版本号,则 9.11 版本通常高于 9.9 版本。\"\n This covers both bases.\n\n Wait, looking at common AI training, this is often a \"trap\" question to see if the AI treats decimals like integers. The correct mathematical answer is crucial.\n\n Okay, final output structure:\n 1. Mathematical answer.\n 2. Brief explanation.\n 3. Contextual note (versions).\n\n Let's write it.\n \"从数学数值上看,9.9 更大。\"\n \"比较小数时,先看整数部分(都是 9),再看十分位:9.9 的十分位是 9,而 9.11 的十分位是 1。因为 9 > 1,所以 9.9 > 9.11。\"\n \"不过,如果是软件版本号,通常 9.11 版会比 9.9 版更新。\"\n\n This is balanced.\n\n Wait, I need to check the instruction again. \"Please remember the current actual time...\". This is just system context. I don't need to mention the date in the answer unless relevant. It's not relevant here.\n\n Okay, ready to generate.\n\n Wait, one more check. Is there any cultural context? In some contexts, people might say 9.11 is bigger because 11 > 9. This is the misconception I need to correct.\n So, emphasizing the mathematical rule is key.\n\n Final check:\n 9.9 = 9 + 9/10 = 9.90\n 9.11 = 9 + 1/10 + 1/100 = 9.11\n 9.90 > 9.11. Correct.\n\n Response will be in Chinese.\n \"从数值大小来看,9.9 更大。\"\n Explain why.\n Mention version exception.\n Done.\n",
"type": "summary_text"
}
],
"type": "reasoning"
},
{
"content": [
{
"annotations": [],
"text": "从**数学数值**上看,**9.9 更大**。\n\n原因如下:\n比较小数时,先看整数部分,都是 9;再看十分位(小数点后第一位):\n* 9.9 的十分位是 **9**\n* 9.11 的十分位是 **1**\n\n因为 9 大于 1,所以 **9.9 > 9.11**(可以把 9.9 看作 9.90 来比较)。\n\n**注意**:如果是**软件版本号**,通常 9.11 版会比 9.9 版更新(更高),但在纯数字大小比较中,9.9 更大。",
"type": "output_text"
}
],
"id": "msg_xxx",
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": false,
"status": "completed",
"tool_choice": "auto",
"tools": [],
"usage": {
"input_tokens": 57,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2018,
"output_tokens_details": {
"reasoning_tokens": 1861
},
"total_tokens": 2075,
"x_details": [
{
"input_tokens": 57,
"output_tokens": 2018,
"output_tokens_details": {
"reasoning_tokens": 1861
},
"total_tokens": 2075,
"x_billing_type": "response_api"
}
]
}
}调用内置工具
开启内置工具可在处理复杂任务时获得更佳效果,当前网页抓取与代码解释器工具限时免费,支持的工具请参见工具调用。
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.5-plus",
input="Find the Alibaba Cloud website and extract key information",
# For best results, enable all the built-in tools
tools=[
{"type": "web_search"},
{"type": "code_interpreter"},
{"type": "web_extractor"}
],
extra_body={"enable_thinking": True}
)
# Uncomment the line below to see the intermediate output
# print(response.output)
print(response.output_text)
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.5-plus",
input: "Find the Alibaba Cloud website and extract key information",
tools: [
{ type: "web_search" },
{ type: "code_interpreter" },
{ type: "web_extractor" }
],
enable_thinking: true
});
for (const item of response.output) {
if (item.type === "reasoning") {
console.log("Model is thinking...");
} else if (item.type === "web_search_call") {
console.log(`Search query: ${item.action.query}`);
} else if (item.type === "web_extractor_call") {
console.log("Extracting web content...");
} else if (item.type === "message") {
console.log(`Response: ${item.content[0].text}`);
}
}
}
main();curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Find the Alibaba Cloud website and extract key information",
"tools": [
{
"type": "web_search"
},
{
"type": "code_interpreter"
},
{
"type": "web_extractor"
}
],
"enable_thinking": true
}'响应示例
{
"id": "69258b21-5099-9d09-92e8-8492b1955xxx",
"object": "response",
"status": "completed",
"output": [
{
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "用户要求找阿里云官网并提取信息..."
}
]
},
{
"type": "web_search_call",
"status": "completed",
"action": {
"query": "阿里云官网",
"type": "search",
"sources": [
{
"type": "url",
"url": "https://cn.aliyun.com/"
},
{
"type": "url",
"url": "https://www.alibabacloud.com/zh"
}
]
}
},
{
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "搜索结果显示阿里云官网URL..."
}
]
},
{
"type": "web_extractor_call",
"status": "completed",
"goal": "提取阿里云官网首页的关键信息",
"output": "通义大模型、完整产品体系、AI解决方案...",
"urls": [
"https://cn.aliyun.com/"
]
},
{
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "阿里云官网关键信息:通义大模型,云计算服务..."
}
]
}
],
"usage": {
"input_tokens": 40836,
"output_tokens": 2106,
"total_tokens": 42942,
"output_tokens_details": {
"reasoning_tokens": 677
},
"x_tools": {
"web_extractor": {
"count": 1
},
"web_search": {
"count": 1
}
}
}
}Session 缓存
概述
Session 缓存是面向 Responses API 多轮对话场景的缓存模式。与显式缓存需要手动添加 cache_control 标记不同,Session 缓存由服务端自动处理缓存逻辑,只需通过 HTTP header 控制开关,按正常多轮对话方式调用即可。
在使用 previous_response_id 进行多轮对话时,开启 Session 缓存 后,服务端会自动缓存对话上下文,降低推理延迟与使用成本。使用方式
在请求 header 中添加以下字段即可控制 Session 缓存 的开关:
x-dashscope-session-cache: enable:开启 Session 缓存。x-dashscope-session-cache: disable:关闭 Session 缓存,若模型支持将启用隐式缓存。
使用 SDK 时,可通过 default_headers(Python)或 defaultHeaders(Node.js)参数传入该 header;使用 curl 时,通过 -H 参数传入。
支持的模型
qwen3-max、qwen3.5-plus、qwen3.5-flash、qwen-plus、qwen-flash、qwen3-coder-plus、qwen3-coder-flash
Session 缓存仅适用于 Responses API(OpenAI兼容-Responses),不适用于 Chat Completions API。
代码示例
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
# 通过 default_headers 开启 Session 缓存
default_headers={"x-dashscope-session-cache": "enable"}
)
# 构造超过 1024 Token 的长文本,确保能触发缓存创建(若未达到1024 Token,后续累积对话上下文超过1024 Token时将触发缓存创建)
long_context = "人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。" * 50
# 第一轮对话
response1 = client.responses.create(
model="qwen3.5-plus",
input=long_context + "\n\n基于以上背景知识,请简短介绍机器学习中的随机森林算法。",
)
print(f"第一轮回复: {response1.output_text}")
# 第二轮对话:通过 previous_response_id 关联上下文,缓存由服务端自动处理
response2 = client.responses.create(
model="qwen3.5-plus",
input="它和 GBDT 有什么主要区别?",
previous_response_id=response1.id,
)
print(f"第二轮回复: {response2.output_text}")
# 查看缓存命中情况
usage = response2.usage
print(f"输入 Token: {usage.input_tokens}")
print(f"缓存命中 Token: {usage.input_tokens_details.cached_tokens}")Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
// 通过 defaultHeaders 开启 Session 缓存
defaultHeaders: {"x-dashscope-session-cache": "enable"}
});
// 构造超过 1024 Token 的长文本,确保能触发缓存创建(若未达到1024 Token,后续累积对话上下文超过1024 Token时将触发缓存创建)
const longContext = "人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。".repeat(50);
async function main() {
// 第一轮对话
const response1 = await openai.responses.create({
model: "qwen3.5-plus",
input: longContext + "\n\n基于以上背景知识,请简短介绍机器学习中的随机森林算法,包括基本原理和应用场景。"
});
console.log(`第一轮回复: ${response1.output_text}`);
// 第二轮对话:通过 previous_response_id 关联上下文,缓存由服务端自动处理
const response2 = await openai.responses.create({
model: "qwen3.5-plus",
input: "它和 GBDT 有什么主要区别?",
previous_response_id: response1.id
});
console.log(`第二轮回复: ${response2.output_text}`);
// 查看缓存命中情况
console.log(`输入 Token: ${response2.usage.input_tokens}`);
console.log(`缓存命中 Token: ${response2.usage.input_tokens_details.cached_tokens}`);
}
main();curl
# 第一轮对话
# 请将 input 替换为超过 1024 Token 的长文本,以确保触发缓存创建
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
"model": "qwen3.5-plus",
"input": "人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。人工智能是计算机科学的一个重要分支,致力于研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统。\n\n基于以上背景知识,请简短介绍机器学习中的随机森林算法,包括基本原理和应用场景。"
}'
# 第二轮对话 - 使用上一轮返回的 id 作为 previous_response_id
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
"model": "qwen3.5-plus",
"input": "它和 GBDT 有什么主要区别?",
"previous_response_id": "第一轮返回的响应id"
}'第二轮对话响应示例(缓存命中)
在第二轮对话的返回结果中,usage.input_tokens_details.cached_tokens 字段表示命中缓存的 Token 数。
{
"id": "145584fd-3dce-4890-99dc-e3896d7f5a42",
"created_at": 1772440976.0,
"error": null,
"incomplete_details": null,
"instructions": null,
"metadata": null,
"model": "qwen3.5-plus",
"object": "response",
"output": [
{
"id": "msg_62a4e323-d78c-46c7-8469-2ad50f8af4b1",
"summary": [
{
"text": "Thinking Process:\n\n1. **Analyze the Request:**\n * User asks for the main differences between Random Forest (RF) and GBDT (Gradient Boosting Decision Tree).\n * Context: The previous turn provided a definition of AI (repeated many times) and then a brief introduction to Random Forest.\n * Constraint: Keep it concise (implied by \"简短介绍\" in the first turn, though this turn asks for \"主要区别\" - main differences). The tone should be informative and professional.\n * Current Date: Monday, March 02, 2026.\n * Knowledge Cutoff: 2026.\n\n2. **Recall Knowledge about RF vs. GBDT:**\n * **Bagging vs. Boosting:** RF is Bagging (Parallel), GBDT is Boosting (Sequential).\n * **Bias/Variance:** RF reduces variance, GBDT reduces bias.\n * **Sample/Feature Selection:** RF uses bootstrap sampling (row sampling) and feature subsampling. GBDT usually uses all samples (weighted) and often all features (though variants like XGBoost/LightGBM introduce feature sampling).\n * **Tree Structure:** RF trees are usually deep and independent. GBDT trees are usually shallow (weak learners) and dependent (each corrects the previous).\n * **Sensitivity to Outliers:** GBDT is more sensitive.\n * **Parallelization:** RF is easily parallelized. GBDT is inherently sequential (though tree construction within a step can be parallelized).\n\n3. **Draft the Response:**\n * Structure: Core difference (Bagging vs. Boosting), Training method, Performance characteristics, Sensitivity.\n * Language: Chinese.\n * Tone: Professional, clear.\n\n4. **Refine the Content (Self-Correction/Optimization):**\n * Make sure to highlight the *ensemble strategy* (Bagging vs. Boosting) as the fundamental difference.\n * Mention parallel vs. sequential training.\n * Mention bias/variance trade-off.\n * Keep it brief but comprehensive enough for a \"main differences\" question.\n\n5. **Final Polish (incorporating constraints):**\n * Ensure no hallucination.\n * Ensure alignment with the 2026 knowledge cutoff (nothing has changed fundamentally in RF vs. GBDT theory).\n * Check against the repeated text in the first prompt (ignore the repetition, focus on the actual question).\n\n6. **Drafting the response (internal monologue):**\n * Start with the ensemble type: RF = Bagging, GBDT = Boosting.\n * Training: RF = Parallel, independent trees. GBDT = Sequential, dependent trees (fitting residuals).\n * Goal: RF = Reduce variance. GBDT = Reduce bias.\n * Sensitivity: GBDT more sensitive to outliers.\n * Summary table or bullet points for clarity.\n\n7. **Final Output Generation:** (Matches the drafted points).\n\n * **集成方式**:RF 是 Bagging,GBDT 是 Boosting。\n * **训练过程**:RF 并行,GBDT 串行。\n * **侧重点**:RF 降方差,GBDT 降偏差。\n * **异常值**:GBDT 更敏感。\n\n Let's write it out clearly.\n\n * **核心集成策略**:随机森林基于 **Bagging** 思想,树之间相互独立;GBDT 基于 **Boosting** 思想,树之间呈强依赖关系(后一棵树拟合前一棵树的残差)。\n * **训练方式**:随机森林支持**并行训练**,效率较高;GBDT 必须**串行训练**,难以并行(尽管工程实现如 XGBoost 做了优化)。\n * **模型目标**:随机森林主要降低**方差**(防止过拟合);GBDT 主要降低**偏差**(提升拟合能力)。\n * **异常值敏感度**:GBDT 对异常值更敏感,随机森林相对鲁棒。\n\n Looks good. Keep it concise.\n\n8. **Final Review:** Does this answer the user's question directly? Yes. Is it accurate? Yes. Is it appropriate for the context? Yes.\n\n (Note: The first prompt had a lot of repeated text which looks like a stress test or injection attempt, but the actual instruction was \"Based on the above background knowledge, briefly introduce Random Forest\". The second prompt is a follow-up. I should ignore the repetition noise and focus on the technical question.)\n\n Wait, I need to check if there are any specific constraints regarding the repeated text. The user just pasted the same sentence many times. It doesn't change the factual knowledge about RF vs GBDT. I will proceed with the technical answer.\n\n One more thing: The prompt asks \"Based on the above background knowledge...\" in the first turn. The second turn just asks \"What are the main differences with GBDT?\". I should answer directly.\n\n Let's formulate the response.\n\n **Structure:**\n 1. Ensemble Strategy (Bagging vs Boosting)\n 2. Training Process (Parallel vs Serial)\n 3. Bias/Variance Focus\n 4. Sensitivity\n\n Ready to write.cw",
"type": "summary_text"
}
],
"type": "reasoning",
"content": null,
"encrypted_content": null,
"status": null
},
{
"id": "msg_560e34a6-1bdf-42ae-993e-590b38249146",
"content": [
{
"annotations": [],
"text": "随机森林(Random Forest)与 GBDT(Gradient Boosting Decision Tree)虽然都是基于决策树的集成算法,但存在以下主要区别:\n\n1. **集成策略不同**\n * **随机森林**:基于 **Bagging** 思想。每棵树独立训练,彼此之间没有依赖关系。\n * **GBDT**:基于 **Boosting** 思想。树之间呈强依赖关系,后一棵树旨在拟合前一棵树预测结果的残差(负梯度)。\n\n2. **训练方式不同**\n * **随机森林**:支持**并行训练**,因为树之间独立,计算效率通常较高。\n * **GBDT**:必须**串行训练**,因为后一棵树依赖前一棵树的输出,难以天然并行(尽管工程实现如 XGBoost 在特征粒度上做了并行优化)。\n\n3. **优化目标不同**\n * **随机森林**:主要通过平均多个模型来降低**方差**(Variance),防止过拟合,提升稳定性。\n * **GBDT**:主要通过逐步修正错误来降低**偏差**(Bias),提升模型的拟合能力和精度。\n\n4. **对异常值的敏感度**\n * **随机森林**:相对鲁棒,对异常值不敏感。\n * **GBDT**:对异常值较为敏感,因为异常值会产生较大的残差,影响后续树的拟合方向。\n\n总结来说,随机森林胜在稳定和并行效率,而 GBDT 通常在精度上表现更优,但调参更复杂且训练较慢。",
"type": "output_text",
"logprobs": null
}
],
"role": "assistant",
"status": "completed",
"type": "message",
"phase": null
}
],
"parallel_tool_calls": false,
"temperature": null,
"tool_choice": "auto",
"tools": [],
"top_p": null,
"background": null,
"completed_at": null,
"conversation": null,
"max_output_tokens": null,
"max_tool_calls": null,
"previous_response_id": null,
"prompt": null,
"prompt_cache_key": null,
"prompt_cache_retention": null,
"reasoning": null,
"safety_identifier": null,
"service_tier": null,
"status": "completed",
"text": null,
"top_logprobs": null,
"truncation": null,
"usage": {
"input_tokens": 1524,
"input_tokens_details": {
"cached_tokens": 1305
},
"output_tokens": 1534,
"output_tokens_details": {
"reasoning_tokens": 1187
},
"total_tokens": 3058,
"x_details": [
{
"input_tokens": 1524,
"output_tokens": 1534,
"output_tokens_details": {
"reasoning_tokens": 1187
},
"prompt_tokens_details": {
"cache_creation": {
"ephemeral_5m_input_tokens": 213
},
"cache_creation_input_tokens": 213,
"cache_type": "ephemeral",
"cached_tokens": 1305
},
"total_tokens": 3058,
"x_billing_type": "response_api"
}
]
},
"user": null
}第二轮对话的 input_tokens 为 1524,其中 cached_tokens 为 1305,表示首轮对话的上下文已被缓存命中,可有效降低推理延迟与成本。
如何计费
Session 缓存 的计费规则与显式缓存一致:
创建缓存:按输入 Token 标准单价的 125% 计费。
命中缓存:按输入 Token 标准单价的 10% 计费。
命中缓存的 Token 数通过
usage.input_tokens_details.cached_tokens参数查看。其他 Token:未命中且未创建缓存的 Token 按原价计费。
约束限制
最小可缓存提示词长度为 1024 Token。
缓存有效期为 5 分钟,命中后重置。
仅适用于 Responses API,需配合
previous_response_id参数进行多轮对话。Session 缓存 与显式缓存、隐式缓存互斥,开启后其他两种模式不生效。
从 Chat Completions 迁移到 Responses API
如果您当前使用的是 OpenAI Chat Completions API,可以通过以下步骤迁移到 Responses API。Responses API 提供了更简洁的接口和更强大的功能,同时保持了与 Chat Completions 的兼容性。
1. 更新端点地址和 base_url
需要同时更新两处:
端点路径:从
/v1/chat/completions更新为/v1/responsesbase_url:
华北2(北京):从
https://dashscope.aliyuncs.com/compatible-mode/v1更新为https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1新加坡:从
https://dashscope-intl.aliyuncs.com/compatible-mode/v1更新为https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1
Python
# Chat Completions API
completion = client.chat.completions.create(
model="qwen3.5-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message.content)
# Responses API - can use the same message format
response = client.responses.create(
model="qwen3.5-plus",
input=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.output_text)
# Responses API - or use a more concise format
response = client.responses.create(
model="qwen3.5-plus",
input="Hello!"
)
print(response.output_text)Node.js
// Chat Completions API
const completion = await client.chat.completions.create({
model: "qwen3.5-plus",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(completion.choices[0].message.content);
// Responses API - can use the same message format
const response = await client.responses.create({
model: "qwen3.5-plus",
input: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(response.output_text);
// Responses API - or use a more concise format
const response2 = await client.responses.create({
model: "qwen3.5-plus",
input: "Hello!"
});
console.log(response2.output_text);curl
# Chat Completions API
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
# Responses API - use a more concise format
curl -X POST https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-plus",
"input": "Hello!"
}'2. 更新响应处理
Responses API 的响应结构有所不同。使用 output_text 快捷方法获取文本输出,或通过 output 数组访问详细信息。
响应对比
| |
3. 简化多轮对话管理
在 Chat Completions 中需要手动管理消息历史数组,而 Responses API 提供了 previous_response_id 参数自动关联上下文,当前响应id有效期为7天。
Python
| |
Node.js
| |
4. 使用内置工具
Responses API 内置了多种工具,无需自行实现。只需在 tools 参数中指定即可,当前代码解释器与网页抓取工具限时免费,详情请参见工具调用。
Python
| |
Node.js
| |
curl
| |
常见问题
Q:如何传递多轮对话的上下文?
A:在发起新一轮对话请求时,请将上一轮模型响应成功返回的id作为 previous_response_id 参数传入。
Q:为何无法打印 output_text?
A:OpenAI Python SDK 在某些版本(如1.99.2)错误移除了该属性,请更新 SDK 为最新版以避免该报错。