Pada aplikasi seperti chat real-time atau generasi teks panjang, waktu tunggu yang lama dapat menurunkan pengalaman pengguna dan memicu timeout di sisi server, sehingga menyebabkan kegagalan tugas. Keluaran streaming mengatasi dua masalah utama ini dengan terus-menerus mengembalikan potongan teks (chunk) saat model menghasilkannya.
Cara kerja
Keluaran streaming didasarkan pada protokol Server-Sent Events (SSE). Setelah Anda membuat permintaan streaming, server membentuk koneksi HTTP persisten dengan client. Setiap kali model menghasilkan blok teks—juga dikenal sebagai chunk—server segera mendorong chunk tersebut melalui koneksi tersebut. Setelah seluruh konten dihasilkan, server mengirimkan sinyal akhir.
Client mendengarkan aliran acara (event stream), menerima dan memproses chunk secara real-time, misalnya dengan merender teks karakter per karakter di antarmuka. Ini berbeda dari panggilan non-streaming yang mengembalikan seluruh konten sekaligus.
Hanya untuk referensi. Tidak ada permintaan yang benar-benar dikirim.
Penagihan
Aturan penagihan untuk keluaran streaming sama dengan panggilan non-streaming. Penagihan didasarkan pada jumlah token input dan output dalam permintaan.
Jika permintaan terputus, Anda hanya ditagih untuk token output yang telah dihasilkan sebelum server menerima permintaan penghentian.
Mulai
Beberapa model hanya mendukung panggilan streaming: versi open-source Qwen3, serta versi komersial dan open-source QwQ, QVQ, dan Qwen-Omni.
Langkah 1: Atur kunci API dan pilih wilayah
Buat kunci API dan ekspor sebagai variabel lingkungan.
Mengatur kunci API sebagai variabel lingkungan (DASHSCOPE_API_KEY) lebih aman daripada hardcoding di kode Anda.Langkah 2: Buat permintaan streaming
Kompatibel dengan OpenAI
Cara mengaktifkan
Atur parameter
streamketrue.Lihat penggunaan token
Secara default, protokol OpenAI tidak mengembalikan informasi penggunaan token. Untuk menyertakan informasi penggunaan token dalam blok data terakhir, atur
stream_options={"include_usage": true}.
Python
import os
from openai import OpenAI
# 1. Persiapan: Inisialisasi client.
client = OpenAI(
# Kami merekomendasikan mengonfigurasi kunci API sebagai variabel lingkungan untuk menghindari hardcoding.
api_key=os.environ["DASHSCOPE_API_KEY"],
# Kunci API bersifat spesifik wilayah. Pastikan base_url sesuai dengan wilayah kunci API Anda.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# 2. Buat permintaan streaming.
completion = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Introduce yourself."}
],
stream=True,
stream_options={"include_usage": True}
)
# 3. Proses respons streaming.
# Menyimpan potongan respons dalam daftar lalu menggabungkannya lebih efisien daripada menggabungkan string berulang kali.
content_parts = []
print("AI: ", end="", flush=True)
for chunk in completion:
if chunk.choices:
content = chunk.choices[0].delta.content or ""
print(content, end="", flush=True)
content_parts.append(content)
elif chunk.usage:
print("\n--- Request Usage ---")
print(f"Input Tokens: {chunk.usage.prompt_tokens}")
print(f"Output Tokens: {chunk.usage.completion_tokens}")
print(f"Total Tokens: {chunk.usage.total_tokens}")
full_response = "".join(content_parts)
# print(f"\n--- Full Response ---\n{full_response}")Contoh respons
AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to let me know!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 87
Total Tokens: 113Node.js
import OpenAI from "openai";
async function main() {
// 1. Persiapan: Inisialisasi client.
// Kami merekomendasikan mengonfigurasi kunci API sebagai variabel lingkungan untuk menghindari hardcoding.
if (!process.env.DASHSCOPE_API_KEY) {
throw new Error("Set the DASHSCOPE_API_KEY environment variable.");
}
// Kunci API bersifat spesifik wilayah. Pastikan baseURL sesuai dengan wilayah kunci API Anda.
// Wilayah China (Beijing): https://dashscope.aliyuncs.com/compatible-mode/v1
// Wilayah Singapura: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
try {
// 2. Buat permintaan streaming.
const stream = await client.chat.completions.create({
model: "qwen-plus",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Introduce yourself." },
],
stream: true,
// Tujuan: Dapatkan penggunaan token untuk permintaan ini dari chunk terakhir.
stream_options: { include_usage: true },
});
// 3. Proses respons streaming.
const contentParts = [];
process.stdout.write("AI: ");
for await (const chunk of stream) {
// Chunk terakhir tidak berisi choices, tetapi berisi informasi penggunaan.
if (chunk.choices && chunk.choices.length > 0) {
const content = chunk.choices[0]?.delta?.content || "";
process.stdout.write(content);
contentParts.push(content);
} else if (chunk.usage) {
// Permintaan selesai. Cetak penggunaan token.
console.log("\n--- Request Usage ---");
console.log(`Input Tokens: ${chunk.usage.prompt_tokens}`);
console.log(`Output Tokens: ${chunk.usage.completion_tokens}`);
console.log(`Total Tokens: ${chunk.usage.total_tokens}`);
}
}
const fullResponse = contentParts.join("");
// console.log(`\n--- Full Response ---\n${fullResponse}`);
} catch (error) {
console.error("Request failed:", error);
}
}
main();Contoh respons
AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 89
Total Tokens: 115curl
Permintaan
# Pastikan variabel lingkungan DASHSCOPE_API_KEY telah diatur.
# Kunci API bersifat spesifik wilayah. Pastikan URL sesuai dengan wilayah kunci API Anda.
# URL wilayah China (Beijing): https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# URL wilayah Singapura: https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
"model": "qwen-plus",
"messages": [
{"role": "user", "content": "Who are you?"}
],
"stream": true,
"stream_options": {"include_usage": true}
}'Respons
Data yang dikembalikan merupakan respons streaming yang mengikuti protokol SSE. Setiap baris yang diawali dengan data: merepresentasikan satu blok data.
data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[{"delta":{"content":" a"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[{"delta":{"content":" large-scale"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[{"delta":{"content":" language model from Alibaba"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[{"delta":{"content":" Cloud, and my name is Qwen"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[{"delta":{"content":"."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":22,"completion_tokens":17,"total_tokens":39},"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}
data: [DONE]
data:: Muatan data pesan, biasanya berupa string berformat JSON.[DONE]: Menunjukkan bahwa seluruh respons streaming telah berakhir.
DashScope
Cara mengaktifkan
Bergantung pada metode yang Anda gunakan (SDK Python, SDK Java, atau cURL):
SDK Python: Atur parameter
streamkeTrue.SDK Java: Panggil layanan menggunakan antarmuka
streamCall.cURL: Atur parameter header
X-DashScope-SSEkeenable.
Keluaran inkremental
Protokol DashScope mendukung keluaran streaming inkremental maupun non-inkremental.
Inkremental (Direkomendasikan): Setiap chunk data hanya berisi konten yang baru dihasilkan. Untuk mengaktifkan streaming inkremental, atur
incremental_outputketrue.Contoh: ["I ","like ","apples"]
Non-inkremental: Setiap chunk data berisi seluruh konten yang telah dihasilkan sebelumnya. Hal ini membuang lebar pita jaringan dan meningkatkan beban pemrosesan di sisi client. Untuk mengaktifkan streaming non-inkremental, atur
incremental_outputkefalse.Contoh: ["I ","I like ","I like apples"]
Lihat penggunaan token
Setiap blok data mencakup informasi penggunaan token secara real-time.
Python
import os
from http import HTTPStatus
import dashscope
from dashscope import Generation
# 1. Persiapan: Konfigurasi kunci API dan wilayah.
# Kami merekomendasikan mengonfigurasi kunci API sebagai variabel lingkungan untuk menghindari hardcoding.
try:
dashscope.api_key = os.environ["DASHSCOPE_API_KEY"]
except KeyError:
raise ValueError("Set the DASHSCOPE_API_KEY environment variable.")
# Kunci API bersifat spesifik wilayah. Pastikan base_http_api_url sesuai dengan wilayah kunci API Anda.
# Wilayah China (Beijing): https://dashscope.aliyuncs.com/api/v1
# Wilayah Singapura: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
# 2. Buat permintaan streaming.
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Introduce yourself."},
]
try:
responses = Generation.call(
model="qwen-plus",
messages=messages,
result_format="message",
stream=True,
# Penting: Atur ke True untuk mendapatkan keluaran inkremental demi performa lebih baik.
incremental_output=True,
)
# 3. Proses respons streaming.
content_parts = []
print("AI: ", end="", flush=True)
for resp in responses:
if resp.status_code == HTTPStatus.OK:
content = resp.output.choices[0].message.content
print(content, end="", flush=True)
content_parts.append(content)
# Periksa apakah ini paket terakhir.
if resp.output.choices[0].finish_reason == "stop":
usage = resp.usage
print("\n--- Request Usage ---")
print(f"Input Tokens: {usage.input_tokens}")
print(f"Output Tokens: {usage.output_tokens}")
print(f"Total Tokens: {usage.total_tokens}")
else:
# Tangani kesalahan.
print(
f"\nRequest failed: request_id={resp.request_id}, code={resp.code}, message={resp.message}"
)
break
full_response = "".join(content_parts)
# print(f"\n--- Full Response ---\n{full_response}")
except Exception as e:
print(f"An unknown error occurred: {e}")
Contoh respons
AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can help you answer questions and create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 91
Total Tokens: 117Java
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import io.reactivex.Flowable;
import io.reactivex.schedulers.Schedulers;
import java.util.Arrays;
import java.util.concurrent.CountDownLatch;
import com.alibaba.dashscope.protocol.Protocol;
public class Main {
public static void main(String[] args) {
// 1. Dapatkan kunci API.
String apiKey = System.getenv("DASHSCOPE_API_KEY");
if (apiKey == null || apiKey.isEmpty()) {
System.err.println("Set the DASHSCOPE_API_KEY environment variable.");
return;
}
// 2. Inisialisasi instans Generation.
// Berikut adalah URL untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti baseUrl dengan: https://dashscope.aliyuncs.com/compatible-mode/api/v1
// Kunci API bersifat spesifik wilayah. Pastikan baseUrl sesuai dengan wilayah kunci API Anda.
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
CountDownLatch latch = new CountDownLatch(1);
// 3. Bangun parameter permintaan.
GenerationParam param = GenerationParam.builder()
.apiKey(apiKey)
.model("qwen-plus")
.messages(Arrays.asList(
Message.builder()
.role(Role.USER.getValue())
.content("Introduce yourself.")
.build()
))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.incrementalOutput(true) // Aktifkan keluaran inkremental untuk streaming.
.build();
// 4. Lakukan panggilan streaming dan proses respons.
try {
Flowable<GenerationResult> result = gen.streamCall(param);
StringBuilder fullContent = new StringBuilder();
System.out.print("AI: ");
result
.subscribeOn(Schedulers.io()) // Permintaan dieksekusi pada thread I/O.
.observeOn(Schedulers.computation()) // Respons diproses pada thread komputasi.
.subscribe(
// onNext: Proses setiap chunk respons.
message -> {
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
String finishReason = message.getOutput().getChoices().get(0).getFinishReason();
// Keluarkan konten.
System.out.print(content);
fullContent.append(content);
// Saat finishReason tidak null, itu menandakan chunk terakhir. Keluarkan informasi penggunaan.
if (finishReason != null && !"null".equals(finishReason)) {
System.out.println("\n--- Request Usage ---");
System.out.println("Input Tokens: " + message.getUsage().getInputTokens());
System.out.println("Output Tokens: " + message.getUsage().getOutputTokens());
System.out.println("Total Tokens: " + message.getUsage().getTotalTokens());
}
System.out.flush(); // Segera flush output.
},
// onError: Tangani kesalahan.
error -> {
System.err.println("\nRequest failed: " + error.getMessage());
latch.countDown();
},
// onComplete: Callback saat selesai.
() -> {
System.out.println(); // Baris baru.
// System.out.println("Full response: " + fullContent.toString());
latch.countDown();
}
);
// Thread utama menunggu tugas asinkron selesai.
latch.await();
System.out.println("Program execution finished.");
} catch (Exception e) {
System.err.println("Request exception: " + e.getMessage());
e.printStackTrace();
}
}
}Contoh respons
AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can help you answer questions and create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 91
Total Tokens: 117curl
Permintaan
# Pastikan variabel lingkungan DASHSCOPE_API_KEY telah diatur.
# Kunci API bersifat spesifik wilayah. Pastikan URL sesuai dengan wilayah kunci API Anda.
# URL wilayah China (Beijing): https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# URL wilayah Singapura: https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters": {
"result_format": "message",
"incremental_output":true
}
}'Respons
Respons mengikuti format Server-Sent Events (SSE). Setiap pesan mencakup hal berikut:
id: Nomor blok data.
event: Jenis acara, selalu bernilai result.
Informasi kode status HTTP.
data: Muatan data berformat JSON.
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"I am","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":27,"output_tokens":1,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" Qwen","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":30,"output_tokens":4,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":", an Alibaba","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":33,"output_tokens":7,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}
...
id:13
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" or need help, feel free to","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":90,"output_tokens":64,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}
id:14
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" ask me!","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":92,"output_tokens":66,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}
id:15
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":92,"output_tokens":66,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}Untuk model multimodal
Bagian ini berlaku untuk model Qwen-VL, Qwen-VL-OCR, dan Qwen3-Omni-Captioner.
Qwen-Omni hanya mendukung keluaran streaming. Karena keluarannya dapat berisi konten multimodal seperti teks atau audio, logika penguraian hasil yang dikembalikan sedikit berbeda dibandingkan model lain. Untuk informasi lebih lanjut, lihat omni-modal.
Model multimodal memungkinkan Anda menambahkan konten seperti citra dan audio ke percakapan. Implementasi keluaran streaming untuk model-model ini berbeda dari model teks dalam hal berikut:
Pembentukan pesan pengguna: Input untuk model multimodal mencakup konten multimodal seperti citra dan audio, selain teks.
Antarmuka SDK DashScope: Saat menggunakan SDK Python DashScope, panggil antarmuka
MultiModalConversation. Saat menggunakan SDK Java DashScope, panggil kelasMultiModalConversation.
Kompatibel dengan OpenAI
Python
from openai import OpenAI
import os
client = OpenAI(
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus", # Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
messages=[
{"role": "user",
"content": [{"type": "image_url",
"image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
{"type": "text", "text": "What scene is depicted in the image?"}]}],
stream=True,
# stream_options={"include_usage": True}
)
full_content = ""
print("Streaming output content:")
for chunk in completion:
# Jika stream_options.include_usage bernilai True, bidang choices pada chunk terakhir adalah daftar kosong dan harus dilewati. Anda bisa mendapatkan penggunaan token dari chunk.usage.
if chunk.choices and chunk.choices[0].delta.content != "":
full_content += chunk.choices[0].delta.content
print(chunk.choices[0].delta.content)
print(f"Full content: {full_content}")Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-vl-plus", // Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
messages: [
{role: "user",
content: [{"type": "image_url",
"image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
{"type": "text", "text": "What scene is depicted in the image?"}]}],
stream: true,
// stream_options: { include_usage: true },
});
let fullContent = ""
console.log("Streaming output content:")
for await (const chunk of completion) {
// Jika stream_options.include_usage bernilai true, bidang choices pada chunk terakhir adalah array kosong dan harus dilewati. Anda bisa mendapatkan penggunaan token dari chunk.usage.
if (chunk.choices[0] && chunk.choices[0].delta.content != null) {
fullContent += chunk.choices[0].delta.content;
console.log(chunk.choices[0].delta.content);
}
}
console.log(`Full output content: ${fullContent}`)curl
# ======= Penting =======
# Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti URL dasar dengan: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Hapus komentar ini sebelum eksekusi ===
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
}
},
{
"type": "text",
"text": "What scene is depicted in the image?"
}
]
}
],
"stream":true,
"stream_options":{"include_usage":true}
}'DashScope
Python
import os
from dashscope import MultiModalConversation
import dashscope
# Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/compatible-mode/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"text": "What scene is depicted in the image?"}
]
}
]
responses = MultiModalConversation.call(
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen3-vl-plus', # Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
messages=messages,
stream=True,
incremental_output=True)
full_content = ""
print("Streaming output content:")
for response in responses:
if response["output"]["choices"][0]["message"].content:
print(response.output.choices[0].message.content[0]['text'])
full_content += response.output.choices[0].message.content[0]['text']
print(f"Full content: {full_content}")Java
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void streamCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
// must create mutable map.
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
Collections.singletonMap("text", "What scene is depicted in the image?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus") // Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
.messages(Arrays.asList(userMessage))
.incrementalOutput(true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(item -> {
try {
List<Map<String, Object>> content = item.getOutput().getChoices().get(0).getMessage().getContent();
// Periksa apakah konten ada dan tidak kosong.
if (content != null && !content.isEmpty()) {
System.out.println(content.get(0).get("text"));
}
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Penting =======
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti URL dasar dengan: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Hapus komentar ini sebelum eksekusi ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
{"text": "What scene is depicted in the image?"}
]
}
]
},
"parameters": {
"incremental_output": true
}
}'Untuk model berpikir
Model berpikir pertama-tama mengembalikan reasoning_content (proses berpikir), diikuti oleh content (respons). Anda dapat menentukan apakah model berada dalam tahap berpikir atau memberikan respons berdasarkan status paket data.
Untuk informasi lebih lanjut tentang model berpikir, lihat pemikiran mendalam, pemahaman visual, dan penalaran visual.
Untuk mengimplementasikan keluaran streaming untuk Qwen3-Omni-Flash (mode berpikir), lihat omni-modal.
Kompatibel dengan OpenAI
Berikut adalah format respons saat Anda menggunakan SDK Python OpenAI untuk memanggil model qwen-plus dalam mode berpikir dengan keluaran streaming:
# Tahap berpikir
...
ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content='Cover all key points while')
ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content='being natural and fluent.')
# Tahap menjawab
ChoiceDelta(content='Hello! I am **Q', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None)
ChoiceDelta(content='wen** (', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None)
...Jika
reasoning_contenttidak None dancontentadalahNone, model berada dalam tahap berpikir.Jika
reasoning_contentadalah None dancontenttidakNone, model berada dalam tahap menjawab.Jika keduanya
None, tahapnya sama dengan paket sebelumnya.
Python
Kode contoh
from openai import OpenAI
import os
# Inisialisasi client OpenAI.
client = OpenAI(
# Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
messages = [{"role": "user", "content": "Who are you?"}]
completion = client.chat.completions.create(
model="qwen-plus", # Anda dapat mengganti ini dengan model pemikiran mendalam lain sesuai kebutuhan.
messages=messages,
# Parameter enable_thinking mengaktifkan proses berpikir. Parameter ini tidak didukung untuk model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, dan QwQ.
extra_body={"enable_thinking": True},
stream=True,
# stream_options={
# "include_usage": True
# },
)
reasoning_content = "" # Seluruh proses berpikir
answer_content = "" # Seluruh respons
is_answering = False # Apakah tahap menjawab telah dimulai
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
for chunk in completion:
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
continue
delta = chunk.choices[0].delta
# Kumpulkan hanya konten berpikir.
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# Menerima konten, mulai menjawab.
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.content
Contoh respons
====================Thinking process====================
Okay, the user is asking "Who are you?". I need to provide an accurate and friendly answer. First, I must confirm my identity: Qwen, developed by the Tongyi Lab at Alibaba Group. Next, I should explain my main functions, such as answering questions, creating text, and logical reasoning. I need to maintain a friendly tone and avoid being too technical to make the user feel at ease. I should also avoid complex jargon to keep the answer simple and clear. Additionally, I might add some interactive elements, inviting the user to ask more questions to encourage further conversation. Finally, I'll check if I've missed any important information, such as my Chinese name "Tongyi Qianwen" and English name "Qwen", and my parent company and lab. I need to ensure the answer is comprehensive and meets the user's expectations.
====================Full response====================
Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text, perform logical reasoning, write code, and more, all to provide users with high-quality information and services. You can call me Qwen, or just Tongyi Qianwen. How can I help you?Node.js
Kode contoh
import OpenAI from "openai";
import process from 'process';
// Inisialisasi client OpenAI.
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Baca dari variabel lingkungan
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
async function main() {
try {
const messages = [{ role: 'user', content: 'Who are you?' }];
const stream = await openai.chat.completions.create({
// Anda dapat mengganti ini dengan model Qwen3 atau QwQ lain sesuai kebutuhan.
model: 'qwen-plus',
messages,
stream: true,
// Parameter enable_thinking mengaktifkan proses berpikir. Parameter ini tidak didukung untuk model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, dan QwQ.
enable_thinking: true
});
console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Kumpulkan hanya konten berpikir.
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// Mulai menjawab setelah menerima konten.
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
isAnswering = True;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();Contoh respons
====================Thinking process====================
Okay, the user is asking "Who are you?". I need to answer with my identity. First, I should clearly state that I am Qwen, a large-scale language model developed by Alibaba Cloud. Next, I can mention my main functions, such as answering questions, creating text, and logical reasoning. I should also emphasize my multilingual support, including Chinese and English, so the user knows I can handle requests in different languages. Additionally, I might need to explain my application scenarios, such as helping with study, work, and daily life. However, the user's question is quite direct, so I should keep it concise. I also need to ensure a friendly tone and invite the user to ask further questions. I will check for any missing important information, such as my version or latest updates, but the user probably doesn't need that level of detail. Finally, I will confirm the answer is accurate and free of errors.
====================Full response====================
I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can perform various tasks such as answering questions, creating text, logical reasoning, and coding. I support multiple languages, including Chinese and English. If you have any questions or need help, feel free to let me know!HTTP
Kode contoh
curl
Untuk model open-source Qwen3, atur enable_thinking ke true untuk mengaktifkan mode berpikir. Parameter enable_thinking tidak berpengaruh pada model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, QwQ, .
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'Contoh respons
data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
.....
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}
data: [DONE]DashScope
Berikut adalah format data saat Anda menggunakan SDK Python DashScope untuk memanggil model qwen-plus dalam mode berpikir:
# Tahap berpikir
...
{"role": "assistant", "content": "", "reasoning_content": "informative, "}
{"role": "assistant", "content": "", "reasoning_content": "so the user finds it helpful."}
# Tahap menjawab
{"role": "assistant", "content": "I am Qwen", "reasoning_content": ""}
{"role": "assistant", "content": ", developed by Tongyi Lab", "reasoning_content": ""}
...Jika
reasoning_contentbukan string kosong dancontentadalah string kosong, model berada dalam tahap berpikir.Jika
reasoning_contentadalah string kosong dancontentbukan string kosong, model berada dalam tahap menjawab.Jika keduanya string kosong, tahapnya sama dengan paket sebelumnya.
Python
Kode contoh
import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Anda dapat mengganti ini dengan model pemikiran mendalam lain sesuai kebutuhan.
model="qwen-plus",
messages=messages,
result_format="message", # Model open-source Qwen3 hanya mendukung "message". Untuk pengalaman lebih baik, kami sarankan Anda mengatur parameter ini ke "message" untuk model lain juga.
# Aktifkan pemikiran mendalam. Parameter ini tidak berpengaruh pada model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, dan QwQ.
enable_thinking=True,
stream=True,
incremental_output=True, # Model open-source Qwen3 hanya mendukung true. Untuk pengalaman lebih baik, kami sarankan Anda mengatur parameter ini ke true untuk model lain juga.
)
# Definisikan seluruh proses berpikir.
reasoning_content = ""
# Definisikan seluruh respons.
answer_content = ""
# Tentukan apakah proses berpikir telah selesai dan respons sedang dihasilkan.
is_answering = False
print("=" * 20 + "Thinking process" + "=" * 20)
for chunk in completion:
# Jika proses berpikir dan respons keduanya kosong, jangan lakukan apa-apa.
if (
chunk.output.choices[0].message.content == ""
and chunk.output.choices[0].message.reasoning_content == ""
):
pass
else:
# Jika bagian saat ini adalah proses berpikir.
if (
chunk.output.choices[0].message.reasoning_content != ""
and chunk.output.choices[0].message.content == ""
):
print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# Jika bagian saat ini adalah respons.
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="", flush=True)
answer_content += chunk.output.choices[0].message.content
# Untuk mencetak seluruh proses berpikir dan respons, hapus komentar dan jalankan kode berikut.
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Contoh respons
====================Thinking process====================
Okay, the user is asking, "Who are you?" I need to answer this question. First, I must clarify my identity: Qwen, a large-scale language model developed by Alibaba Cloud. Next, I should explain my functions and purposes, such as answering questions, creating text, and logical reasoning. I should also emphasize my goal of being a helpful assistant to users, providing help and support.
When responding, I should maintain a conversational tone and avoid using technical jargon or complex sentence structures. I can use friendly expressions, like "Hello there! ~", to make the conversation more natural. I also need to ensure the information is accurate and does not omit key points, such as my developer, main functions, and application scenarios.
I should also consider potential follow-up questions from the user, such as specific application examples or technical details. So, I can subtly set up opportunities in my answer to guide the user to ask more questions. For example, by mentioning, "Whether it's a question about daily life or a professional field, I can do my best to help," which is both comprehensive and open-ended.
Finally, I will check if the response is fluent, without repetition or redundancy, ensuring it is concise and clear. I will also maintain a balance between being friendly and professional, so the user feels that I am both approachable and reliable.
====================Full response====================
Hello there! ~ I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, create text, perform logical reasoning, write code, and more, all to provide help and support to users. Whether it's a question about daily life or a professional field, I can do my best to help. How can I assist you?Java
Kode contoh
// dashscope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Full response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-plus")
.enableThinking(true)
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Cetak hasil akhir.
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Hasil
====================Thinking process====================
Okay, the user is asking "Who are you?". I need to answer based on my previous settings. First, my role is Qwen, a large-scale language model from Alibaba Group. I need to keep my language conversational, simple, and easy to understand.
The user might be new to me or wants to confirm my identity. I should first directly answer who I am, then briefly explain my functions and uses, such as answering questions, creating text, and coding. I also need to mention my multilingual support so the user knows I can handle requests in different languages.
Also, according to the guidelines, I need to maintain a human-like personality, so my tone should be friendly, and I might use emojis to add a touch of warmth. I might also need to guide the user to ask further questions or use my functions, for example, by asking them what they need help with.
I need to be careful not to use complex jargon and avoid being long-winded. I will check for any missed key points, such as multilingual support and specific capabilities. I will ensure the answer meets all requirements, including being conversational and concise.
====================Full response====================
Hello! I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I am proficient in multiple languages, including but not limited to Chinese, English, German, French, and Spanish. Is there anything I can help you with?HTTP
Kode contoh
curl
Untuk model berpikir hibrida, atur enable_thinking ke true untuk mengaktifkan mode berpikir. Parameter enable_thinking tidak berpengaruh pada model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, QwQ, .
# ======= Penting =======
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Berikut adalah URL untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti URL dengan: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Hapus komentar ini sebelum eksekusi ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'Contoh respons
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" the user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" is asking","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" '","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......
id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" feel free to","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" let me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" know","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}Tayang
Performa dan manajemen sumber daya: Dalam layanan backend, mempertahankan koneksi HTTP persisten untuk setiap permintaan streaming mengonsumsi sumber daya. Pastikan layanan Anda dikonfigurasi dengan ukuran kolam koneksi dan periode timeout yang sesuai. Dalam skenario konkurensi tinggi, pantau penggunaan deskriptor file layanan untuk mencegah kehabisan.
Rendering sisi client: Pada antarmuka depan web, gunakan API
ReadableStreamdanTextDecoderStreamuntuk memproses dan merender aliran acara SSE secara lancar demi pengalaman pengguna optimal.Pemantauan penggunaan dan performa:
Metrik utama: Pantau waktu hingga token pertama (TTFT), metrik inti untuk mengukur pengalaman streaming, bersama dengan tingkat kesalahan API dan waktu respons rata-rata.
Pengaturan peringatan: Atur peringatan untuk tingkat kesalahan API yang tidak normal, terutama untuk kesalahan 4xx dan 5xx.
Konfigurasi proxy Nginx: Jika Anda menggunakan Nginx sebagai reverse proxy, buffering output default-nya (proxy_buffering) mengganggu respons streaming real-time. Untuk memastikan data didorong ke client segera, atur
proxy_buffering offdalam file konfigurasi Nginx untuk menonaktifkan fitur ini.
Kode kesalahan
Jika panggilan gagal, lihat Pesan kesalahan untuk troubleshooting.
FAQ
T: Mengapa data yang dikembalikan tidak menyertakan informasi penggunaan?
J: Secara default, protokol OpenAI tidak mengembalikan informasi penggunaan. Atur parameter stream_options untuk menyertakan informasi penggunaan dalam paket terakhir.
T: Apakah keluaran streaming memengaruhi kualitas respons model?
J: Tidak. Namun, beberapa model hanya mendukung keluaran streaming, dan keluaran non-streaming dapat menyebabkan kesalahan timeout. Kami merekomendasikan penggunaan keluaran streaming.