Keluaran streaming untuk model Qwen - Alibaba Cloud Model Studio

Pada aplikasi seperti chat real-time atau generasi teks panjang, waktu tunggu yang lama dapat menurunkan pengalaman pengguna dan memicu timeout di sisi server, sehingga menyebabkan kegagalan tugas. Keluaran streaming mengatasi dua masalah utama ini dengan terus-menerus mengembalikan potongan teks (chunk) saat model menghasilkannya.

Cara kerja

Keluaran streaming didasarkan pada protokol Server-Sent Events (SSE). Setelah Anda membuat permintaan streaming, server membentuk koneksi HTTP persisten dengan client. Setiap kali model menghasilkan blok teks—juga dikenal sebagai chunk—server segera mendorong chunk tersebut melalui koneksi tersebut. Setelah seluruh konten dihasilkan, server mengirimkan sinyal akhir.

Client mendengarkan aliran acara (event stream), menerima dan memproses chunk secara real-time, misalnya dengan merender teks karakter per karakter di antarmuka. Ini berbeda dari panggilan non-streaming yang mengembalikan seluruh konten sekaligus.

⏱️ Wait time: 3 seconds

Stream Disabled

Hanya untuk referensi. Tidak ada permintaan yang benar-benar dikirim.

Penagihan

Aturan penagihan untuk keluaran streaming sama dengan panggilan non-streaming. Penagihan didasarkan pada jumlah token input dan output dalam permintaan.

Jika permintaan terputus, Anda hanya ditagih untuk token output yang telah dihasilkan sebelum server menerima permintaan penghentian.

Mulai

Penting

Beberapa model hanya mendukung panggilan streaming: versi open-source Qwen3, serta versi komersial dan open-source QwQ, QVQ, dan Qwen-Omni.

Langkah 1: Atur kunci API dan pilih wilayah

Buat kunci API dan ekspor sebagai variabel lingkungan.

Mengatur kunci API sebagai variabel lingkungan (DASHSCOPE_API_KEY) lebih aman daripada hardcoding di kode Anda.

Langkah 2: Buat permintaan streaming

Kompatibel dengan OpenAI

Cara mengaktifkan
Atur parameter stream ke true.
Lihat penggunaan token
Secara default, protokol OpenAI tidak mengembalikan informasi penggunaan token. Untuk menyertakan informasi penggunaan token dalam blok data terakhir, atur stream_options={"include_usage": true}.

Python

import os
from openai import OpenAI

# 1. Persiapan: Inisialisasi client.
client = OpenAI(
    # Kami merekomendasikan mengonfigurasi kunci API sebagai variabel lingkungan untuk menghindari hardcoding.
    api_key=os.environ["DASHSCOPE_API_KEY"],
    # Kunci API bersifat spesifik wilayah. Pastikan base_url sesuai dengan wilayah kunci API Anda.
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# 2. Buat permintaan streaming.
completion = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Introduce yourself."}
    ],
    stream=True,
    stream_options={"include_usage": True}
)

# 3. Proses respons streaming.
# Menyimpan potongan respons dalam daftar lalu menggabungkannya lebih efisien daripada menggabungkan string berulang kali.
content_parts = []
print("AI: ", end="", flush=True)

for chunk in completion:
    if chunk.choices:
        content = chunk.choices[0].delta.content or ""
        print(content, end="", flush=True)
        content_parts.append(content)
    elif chunk.usage:
        print("\n--- Request Usage ---")
        print(f"Input Tokens: {chunk.usage.prompt_tokens}")
        print(f"Output Tokens: {chunk.usage.completion_tokens}")
        print(f"Total Tokens: {chunk.usage.total_tokens}")

full_response = "".join(content_parts)
# print(f"\n--- Full Response ---\n{full_response}")

Contoh respons

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to let me know!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 87
Total Tokens: 113

Node.js

import OpenAI from "openai";

async function main() {
    // 1. Persiapan: Inisialisasi client.
    // Kami merekomendasikan mengonfigurasi kunci API sebagai variabel lingkungan untuk menghindari hardcoding.
    if (!process.env.DASHSCOPE_API_KEY) {
        throw new Error("Set the DASHSCOPE_API_KEY environment variable.");
    }
    // Kunci API bersifat spesifik wilayah. Pastikan baseURL sesuai dengan wilayah kunci API Anda.
    // Wilayah China (Beijing): https://dashscope.aliyuncs.com/compatible-mode/v1
    // Wilayah Singapura: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    const client = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    });

    try {
        // 2. Buat permintaan streaming.
        const stream = await client.chat.completions.create({
            model: "qwen-plus",
            messages: [
                { role: "system", content: "You are a helpful assistant." },
                { role: "user", content: "Introduce yourself." },
            ],
            stream: true,
            // Tujuan: Dapatkan penggunaan token untuk permintaan ini dari chunk terakhir.
            stream_options: { include_usage: true },
        });

        // 3. Proses respons streaming.
        const contentParts = [];
        process.stdout.write("AI: ");
        
        for await (const chunk of stream) {
            // Chunk terakhir tidak berisi choices, tetapi berisi informasi penggunaan.
            if (chunk.choices && chunk.choices.length > 0) {
                const content = chunk.choices[0]?.delta?.content || "";
                process.stdout.write(content);
                contentParts.push(content);
            } else if (chunk.usage) {
                // Permintaan selesai. Cetak penggunaan token.
                console.log("\n--- Request Usage ---");
                console.log(`Input Tokens: ${chunk.usage.prompt_tokens}`);
                console.log(`Output Tokens: ${chunk.usage.completion_tokens}`);
                console.log(`Total Tokens: ${chunk.usage.total_tokens}`);
            }
        }
        
        const fullResponse = contentParts.join("");
        // console.log(`\n--- Full Response ---\n${fullResponse}`);

    } catch (error) {
        console.error("Request failed:", error);
    }
}

main();

Contoh respons

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 89
Total Tokens: 115

curl

Permintaan

# Pastikan variabel lingkungan DASHSCOPE_API_KEY telah diatur.
# Kunci API bersifat spesifik wilayah. Pastikan URL sesuai dengan wilayah kunci API Anda.
# URL wilayah China (Beijing): https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# URL wilayah Singapura: https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
    "model": "qwen-plus",
    "messages": [
        {"role": "user", "content": "Who are you?"}
    ],
    "stream": true,
    "stream_options": {"include_usage": true}
}'

Respons

Data yang dikembalikan merupakan respons streaming yang mengikuti protokol SSE. Setiap baris yang diawali dengan data: merepresentasikan satu blok data.

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" a"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" large-scale"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" language model from Alibaba"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" Cloud, and my name is Qwen"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":22,"completion_tokens":17,"total_tokens":39},"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: [DONE]

data:: Muatan data pesan, biasanya berupa string berformat JSON.
[DONE]: Menunjukkan bahwa seluruh respons streaming telah berakhir.

DashScope

Cara mengaktifkan
Bergantung pada metode yang Anda gunakan (SDK Python, SDK Java, atau cURL):
- SDK Python: Atur parameter stream ke True.
- SDK Java: Panggil layanan menggunakan antarmuka streamCall.
- cURL: Atur parameter header X-DashScope-SSE ke enable.
Keluaran inkremental
Protokol DashScope mendukung keluaran streaming inkremental maupun non-inkremental.
- Inkremental (Direkomendasikan): Setiap chunk data hanya berisi konten yang baru dihasilkan. Untuk mengaktifkan streaming inkremental, atur incremental_output ke true.
  Contoh: ["I ","like ","apples"]
- Non-inkremental: Setiap chunk data berisi seluruh konten yang telah dihasilkan sebelumnya. Hal ini membuang lebar pita jaringan dan meningkatkan beban pemrosesan di sisi client. Untuk mengaktifkan streaming non-inkremental, atur incremental_output ke false.
  Contoh: ["I ","I like ","I like apples"]
Lihat penggunaan token
Setiap blok data mencakup informasi penggunaan token secara real-time.

Python

import os
from http import HTTPStatus
import dashscope
from dashscope import Generation

# 1. Persiapan: Konfigurasi kunci API dan wilayah.
# Kami merekomendasikan mengonfigurasi kunci API sebagai variabel lingkungan untuk menghindari hardcoding.
try:
    dashscope.api_key = os.environ["DASHSCOPE_API_KEY"]
except KeyError:
    raise ValueError("Set the DASHSCOPE_API_KEY environment variable.")

# Kunci API bersifat spesifik wilayah. Pastikan base_http_api_url sesuai dengan wilayah kunci API Anda.
# Wilayah China (Beijing): https://dashscope.aliyuncs.com/api/v1
# Wilayah Singapura: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

# 2. Buat permintaan streaming.
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Introduce yourself."},
]

try:
    responses = Generation.call(
        model="qwen-plus",
        messages=messages,
        result_format="message",
        stream=True,
        # Penting: Atur ke True untuk mendapatkan keluaran inkremental demi performa lebih baik.
        incremental_output=True,
    )

    # 3. Proses respons streaming.
    content_parts = []
    print("AI: ", end="", flush=True)

    for resp in responses:
        if resp.status_code == HTTPStatus.OK:
            content = resp.output.choices[0].message.content
            print(content, end="", flush=True)
            content_parts.append(content)

            # Periksa apakah ini paket terakhir.
            if resp.output.choices[0].finish_reason == "stop":
                usage = resp.usage
                print("\n--- Request Usage ---")
                print(f"Input Tokens: {usage.input_tokens}")
                print(f"Output Tokens: {usage.output_tokens}")
                print(f"Total Tokens: {usage.total_tokens}")
        else:
            # Tangani kesalahan.
            print(
                f"\nRequest failed: request_id={resp.request_id}, code={resp.code}, message={resp.message}"
            )
            break

    full_response = "".join(content_parts)
    # print(f"\n--- Full Response ---\n{full_response}")

except Exception as e:
    print(f"An unknown error occurred: {e}")

Contoh respons

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can help you answer questions and create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 91
Total Tokens: 117

Java

import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import io.reactivex.Flowable;
import io.reactivex.schedulers.Schedulers;

import java.util.Arrays;
import java.util.concurrent.CountDownLatch;
import com.alibaba.dashscope.protocol.Protocol;

public class Main {
    public static void main(String[] args) {
        // 1. Dapatkan kunci API.
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        if (apiKey == null || apiKey.isEmpty()) {
            System.err.println("Set the DASHSCOPE_API_KEY environment variable.");
            return;
        }

        // 2. Inisialisasi instans Generation.
        // Berikut adalah URL untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti baseUrl dengan: https://dashscope.aliyuncs.com/compatible-mode/api/v1
        // Kunci API bersifat spesifik wilayah. Pastikan baseUrl sesuai dengan wilayah kunci API Anda.
        Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
        CountDownLatch latch = new CountDownLatch(1);

        // 3. Bangun parameter permintaan.
        GenerationParam param = GenerationParam.builder()
                .apiKey(apiKey)
                .model("qwen-plus")
                .messages(Arrays.asList(
                        Message.builder()
                                .role(Role.USER.getValue())
                                .content("Introduce yourself.")
                                .build()
                ))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .incrementalOutput(true) // Aktifkan keluaran inkremental untuk streaming.
                .build();
        // 4. Lakukan panggilan streaming dan proses respons.
        try {
            Flowable<GenerationResult> result = gen.streamCall(param);
            StringBuilder fullContent = new StringBuilder();
            System.out.print("AI: ");
            result
                    .subscribeOn(Schedulers.io()) // Permintaan dieksekusi pada thread I/O.
                    .observeOn(Schedulers.computation()) // Respons diproses pada thread komputasi.
                    .subscribe(
                            // onNext: Proses setiap chunk respons.
                            message -> {
                                String content = message.getOutput().getChoices().get(0).getMessage().getContent();
                                String finishReason = message.getOutput().getChoices().get(0).getFinishReason();
                                // Keluarkan konten.
                                System.out.print(content);
                                fullContent.append(content);
                                // Saat finishReason tidak null, itu menandakan chunk terakhir. Keluarkan informasi penggunaan.
                                if (finishReason != null && !"null".equals(finishReason)) {
                                    System.out.println("\n--- Request Usage ---");
                                    System.out.println("Input Tokens: " + message.getUsage().getInputTokens());
                                    System.out.println("Output Tokens: " + message.getUsage().getOutputTokens());
                                    System.out.println("Total Tokens: " + message.getUsage().getTotalTokens());
                                }
                                System.out.flush(); // Segera flush output.
                            },
                            // onError: Tangani kesalahan.
                            error -> {
                                System.err.println("\nRequest failed: " + error.getMessage());
                                latch.countDown();
                            },
                            // onComplete: Callback saat selesai.
                            () -> {
                                System.out.println(); // Baris baru.
                                // System.out.println("Full response: " + fullContent.toString());
                                latch.countDown();
                            }
                    );
            // Thread utama menunggu tugas asinkron selesai.
            latch.await();
            System.out.println("Program execution finished.");
        } catch (Exception e) {
            System.err.println("Request exception: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Contoh respons

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can help you answer questions and create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 91
Total Tokens: 117

curl

Permintaan

# Pastikan variabel lingkungan DASHSCOPE_API_KEY telah diatur.
# Kunci API bersifat spesifik wilayah. Pastikan URL sesuai dengan wilayah kunci API Anda.
# URL wilayah China (Beijing): https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# URL wilayah Singapura: https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen-plus",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters": {
        "result_format": "message",
        "incremental_output":true
    }
}'

Respons

Respons mengikuti format Server-Sent Events (SSE). Setiap pesan mencakup hal berikut:

id: Nomor blok data.
event: Jenis acara, selalu bernilai result.
Informasi kode status HTTP.
data: Muatan data berformat JSON.

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"I am","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":27,"output_tokens":1,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" Qwen","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":30,"output_tokens":4,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":", an Alibaba","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":33,"output_tokens":7,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

...


id:13
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" or need help, feel free to","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":90,"output_tokens":64,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:14
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" ask me!","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":92,"output_tokens":66,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:15
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":92,"output_tokens":66,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

Untuk model multimodal

Catatan

Bagian ini berlaku untuk model Qwen-VL, Qwen-VL-OCR, dan Qwen3-Omni-Captioner.
Qwen-Omni hanya mendukung keluaran streaming. Karena keluarannya dapat berisi konten multimodal seperti teks atau audio, logika penguraian hasil yang dikembalikan sedikit berbeda dibandingkan model lain. Untuk informasi lebih lanjut, lihat omni-modal.

Model multimodal memungkinkan Anda menambahkan konten seperti citra dan audio ke percakapan. Implementasi keluaran streaming untuk model-model ini berbeda dari model teks dalam hal berikut:

Pembentukan pesan pengguna: Input untuk model multimodal mencakup konten multimodal seperti citra dan audio, selain teks.
Antarmuka SDK DashScope: Saat menggunakan SDK Python DashScope, panggil antarmuka MultiModalConversation. Saat menggunakan SDK Java DashScope, panggil kelas MultiModalConversation.

Kompatibel dengan OpenAI

Python

from openai import OpenAI
import os

client = OpenAI(
    # Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-vl-plus",  # Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
    messages=[
        {"role": "user",
        "content": [{"type": "image_url",
                    "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "What scene is depicted in the image?"}]}],
    stream=True,
  # stream_options={"include_usage": True}
)
full_content = ""
print("Streaming output content:")
for chunk in completion:
    # Jika stream_options.include_usage bernilai True, bidang choices pada chunk terakhir adalah daftar kosong dan harus dilewati. Anda bisa mendapatkan penggunaan token dari chunk.usage.
    if chunk.choices and chunk.choices[0].delta.content != "":
        full_content += chunk.choices[0].delta.content
        print(chunk.choices[0].delta.content)
print(f"Full content: {full_content}")

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        // Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const completion = await openai.chat.completions.create({
    model: "qwen3-vl-plus",  //  Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
    messages: [
        {role: "user",
        content: [{"type": "image_url",
                    "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "What scene is depicted in the image?"}]}],
    stream: true,
    // stream_options: { include_usage: true },
});

let fullContent = ""
console.log("Streaming output content:")
for await (const chunk of completion) {
    // Jika stream_options.include_usage bernilai true, bidang choices pada chunk terakhir adalah array kosong dan harus dilewati. Anda bisa mendapatkan penggunaan token dari chunk.usage.
    if (chunk.choices[0] && chunk.choices[0].delta.content != null) {
      fullContent += chunk.choices[0].delta.content;
      console.log(chunk.choices[0].delta.content);
    }
}
console.log(`Full output content: ${fullContent}`)

curl

# ======= Penting =======
# Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti URL dasar dengan: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Hapus komentar ini sebelum eksekusi ===

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "What scene is depicted in the image?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true}
}'

DashScope

Python

import os
from dashscope import MultiModalConversation
import dashscope
# Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/compatible-mode/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
            {"text": "What scene is depicted in the image?"}
        ]
    }
]

responses = MultiModalConversation.call(
    # Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3-vl-plus',  #  Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
    messages=messages,
    stream=True,
    incremental_output=True)
    
full_content = ""
print("Streaming output content:")
for response in responses:
    if response["output"]["choices"][0]["message"].content:
        print(response.output.choices[0].message.content[0]['text'])
        full_content += response.output.choices[0].message.content[0]['text']
print(f"Full content: {full_content}")

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {
        // Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti base_url dengan: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    public static void streamCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // must create mutable map.
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "What scene is depicted in the image?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")  //  Anda dapat mengganti ini dengan model multimodal lain dan menyesuaikan pesan sesuai kebutuhan.
                .messages(Arrays.asList(userMessage))
                .incrementalOutput(true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(item -> {
            try {
                List<Map<String, Object>> content = item.getOutput().getChoices().get(0).getMessage().getContent();
                    // Periksa apakah konten ada dan tidak kosong.
                if (content != null &&  !content.isEmpty()) {
                    System.out.println(content.get(0).get("text"));
                    }
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= Penting =======
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Berikut adalah URL dasar untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti URL dasar dengan: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Hapus komentar ini sebelum eksekusi ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "What scene is depicted in the image?"}
                ]
            }
        ]
    },
    "parameters": {
        "incremental_output": true
    }
}'

Untuk model berpikir

Model berpikir pertama-tama mengembalikan reasoning_content (proses berpikir), diikuti oleh content (respons). Anda dapat menentukan apakah model berada dalam tahap berpikir atau memberikan respons berdasarkan status paket data.

Untuk informasi lebih lanjut tentang model berpikir, lihat pemikiran mendalam, pemahaman visual, dan penalaran visual.

Untuk mengimplementasikan keluaran streaming untuk Qwen3-Omni-Flash (mode berpikir), lihat omni-modal.

Kompatibel dengan OpenAI

Berikut adalah format respons saat Anda menggunakan SDK Python OpenAI untuk memanggil model qwen-plus dalam mode berpikir dengan keluaran streaming:

# Tahap berpikir
...
ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content='Cover all key points while')
ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content='being natural and fluent.')
# Tahap menjawab
ChoiceDelta(content='Hello! I am **Q', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None)
ChoiceDelta(content='wen** (', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None)
...

Jika reasoning_content tidak None dan content adalah None, model berada dalam tahap berpikir.
Jika reasoning_content adalah None dan content tidak None, model berada dalam tahap menjawab.
Jika keduanya None, tahapnya sama dengan paket sebelumnya.

Python

Kode contoh

from openai import OpenAI
import os

# Inisialisasi client OpenAI.
client = OpenAI(
    # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you?"}]

completion = client.chat.completions.create(
    model="qwen-plus",  # Anda dapat mengganti ini dengan model pemikiran mendalam lain sesuai kebutuhan.
    messages=messages,
    # Parameter enable_thinking mengaktifkan proses berpikir. Parameter ini tidak didukung untuk model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, dan QwQ.
    extra_body={"enable_thinking": True},
    stream=True,
    # stream_options={
    #     "include_usage": True
    # },
)

reasoning_content = ""  # Seluruh proses berpikir
answer_content = ""  # Seluruh respons
is_answering = False  # Apakah tahap menjawab telah dimulai
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    if not chunk.choices:
        print("\nUsage:")
        print(chunk.usage)
        continue

    delta = chunk.choices[0].delta

    # Kumpulkan hanya konten berpikir.
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        if not is_answering:
            print(delta.reasoning_content, end="", flush=True)
        reasoning_content += delta.reasoning_content

    # Menerima konten, mulai menjawab.
    if hasattr(delta, "content") and delta.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
            is_answering = True
        print(delta.content, end="", flush=True)
        answer_content += delta.content

Contoh respons

====================Thinking process====================

Okay, the user is asking "Who are you?". I need to provide an accurate and friendly answer. First, I must confirm my identity: Qwen, developed by the Tongyi Lab at Alibaba Group. Next, I should explain my main functions, such as answering questions, creating text, and logical reasoning. I need to maintain a friendly tone and avoid being too technical to make the user feel at ease. I should also avoid complex jargon to keep the answer simple and clear. Additionally, I might add some interactive elements, inviting the user to ask more questions to encourage further conversation. Finally, I'll check if I've missed any important information, such as my Chinese name "Tongyi Qianwen" and English name "Qwen", and my parent company and lab. I need to ensure the answer is comprehensive and meets the user's expectations.
====================Full response====================

Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text, perform logical reasoning, write code, and more, all to provide users with high-quality information and services. You can call me Qwen, or just Tongyi Qianwen. How can I help you?

Node.js

Kode contoh

import OpenAI from "openai";
import process from 'process';

// Inisialisasi client OpenAI.
const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY, // Baca dari variabel lingkungan
    baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = '';
let answerContent = '';
let isAnswering = false;

async function main() {
    try {
        const messages = [{ role: 'user', content: 'Who are you?' }];
        const stream = await openai.chat.completions.create({
            // Anda dapat mengganti ini dengan model Qwen3 atau QwQ lain sesuai kebutuhan.
            model: 'qwen-plus',
            messages,
            stream: true,
            // Parameter enable_thinking mengaktifkan proses berpikir. Parameter ini tidak didukung untuk model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, dan QwQ.
            enable_thinking: true
        });
        console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\nUsage:');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;
            
            // Kumpulkan hanya konten berpikir.
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // Mulai menjawab setelah menerima konten.
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                    isAnswering = True;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

Contoh respons

====================Thinking process====================

Okay, the user is asking "Who are you?". I need to answer with my identity. First, I should clearly state that I am Qwen, a large-scale language model developed by Alibaba Cloud. Next, I can mention my main functions, such as answering questions, creating text, and logical reasoning. I should also emphasize my multilingual support, including Chinese and English, so the user knows I can handle requests in different languages. Additionally, I might need to explain my application scenarios, such as helping with study, work, and daily life. However, the user's question is quite direct, so I should keep it concise. I also need to ensure a friendly tone and invite the user to ask further questions. I will check for any missing important information, such as my version or latest updates, but the user probably doesn't need that level of detail. Finally, I will confirm the answer is accurate and free of errors.
====================Full response====================

I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can perform various tasks such as answering questions, creating text, logical reasoning, and coding. I support multiple languages, including Chinese and English. If you have any questions or need help, feel free to let me know!

HTTP

Kode contoh

curl

Untuk model open-source Qwen3, atur enable_thinking ke true untuk mengaktifkan mode berpikir. Parameter enable_thinking tidak berpengaruh pada model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, QwQ, .

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "enable_thinking": true
}'

Contoh respons

data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}

.....

data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}

data: [DONE]

DashScope

Berikut adalah format data saat Anda menggunakan SDK Python DashScope untuk memanggil model qwen-plus dalam mode berpikir:

# Tahap berpikir
...
{"role": "assistant", "content": "", "reasoning_content": "informative, "}
{"role": "assistant", "content": "", "reasoning_content": "so the user finds it helpful."}
# Tahap menjawab
{"role": "assistant", "content": "I am Qwen", "reasoning_content": ""}
{"role": "assistant", "content": ", developed by Tongyi Lab", "reasoning_content": ""}
...

Jika reasoning_content bukan string kosong dan content adalah string kosong, model berada dalam tahap berpikir.
Jika reasoning_content adalah string kosong dan content bukan string kosong, model berada dalam tahap menjawab.
Jika keduanya string kosong, tahapnya sama dengan paket sebelumnya.

Python

Kode contoh

import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"

messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # Anda dapat mengganti ini dengan model pemikiran mendalam lain sesuai kebutuhan.
    model="qwen-plus",
    messages=messages,
    result_format="message", # Model open-source Qwen3 hanya mendukung "message". Untuk pengalaman lebih baik, kami sarankan Anda mengatur parameter ini ke "message" untuk model lain juga.
    # Aktifkan pemikiran mendalam. Parameter ini tidak berpengaruh pada model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, dan QwQ.
    enable_thinking=True,
    stream=True,
    incremental_output=True, # Model open-source Qwen3 hanya mendukung true. Untuk pengalaman lebih baik, kami sarankan Anda mengatur parameter ini ke true untuk model lain juga.
)

# Definisikan seluruh proses berpikir.
reasoning_content = ""
# Definisikan seluruh respons.
answer_content = ""
# Tentukan apakah proses berpikir telah selesai dan respons sedang dihasilkan.
is_answering = False

print("=" * 20 + "Thinking process" + "=" * 20)

for chunk in completion:
    # Jika proses berpikir dan respons keduanya kosong, jangan lakukan apa-apa.
    if (
        chunk.output.choices[0].message.content == ""
        and chunk.output.choices[0].message.reasoning_content == ""
    ):
        pass
    else:
        # Jika bagian saat ini adalah proses berpikir.
        if (
            chunk.output.choices[0].message.reasoning_content != ""
            and chunk.output.choices[0].message.content == ""
        ):
            print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
            reasoning_content += chunk.output.choices[0].message.reasoning_content
        # Jika bagian saat ini adalah respons.
        elif chunk.output.choices[0].message.content != "":
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20)
                is_answering = True
            print(chunk.output.choices[0].message.content, end="", flush=True)
            answer_content += chunk.output.choices[0].message.content

# Untuk mencetak seluruh proses berpikir dan respons, hapus komentar dan jalankan kode berikut.
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")

Contoh respons

====================Thinking process====================
Okay, the user is asking, "Who are you?" I need to answer this question. First, I must clarify my identity: Qwen, a large-scale language model developed by Alibaba Cloud. Next, I should explain my functions and purposes, such as answering questions, creating text, and logical reasoning. I should also emphasize my goal of being a helpful assistant to users, providing help and support.

When responding, I should maintain a conversational tone and avoid using technical jargon or complex sentence structures. I can use friendly expressions, like "Hello there! ~", to make the conversation more natural. I also need to ensure the information is accurate and does not omit key points, such as my developer, main functions, and application scenarios.

I should also consider potential follow-up questions from the user, such as specific application examples or technical details. So, I can subtly set up opportunities in my answer to guide the user to ask more questions. For example, by mentioning, "Whether it's a question about daily life or a professional field, I can do my best to help," which is both comprehensive and open-ended.

Finally, I will check if the response is fluent, without repetition or redundancy, ensuring it is concise and clear. I will also maintain a balance between being friendly and professional, so the user feels that I am both approachable and reliable.
====================Full response====================
Hello there! ~ I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, create text, perform logical reasoning, write code, and more, all to provide help and support to users. Whether it's a question about daily life or a professional field, I can do my best to help. How can I assist you?

Java

Kode contoh

// dashscope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();

        if (!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (!content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Full response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan kunci API Model Studio Anda: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-plus")
                .enableThinking(true)
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }

    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
//             Cetak hasil akhir.
//            if (reasoningContent.length() > 0) {
//                System.out.println("\n====================Full response====================");
//                System.out.println(finalContent.toString());
//            }
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

Hasil

====================Thinking process====================
Okay, the user is asking "Who are you?". I need to answer based on my previous settings. First, my role is Qwen, a large-scale language model from Alibaba Group. I need to keep my language conversational, simple, and easy to understand.

The user might be new to me or wants to confirm my identity. I should first directly answer who I am, then briefly explain my functions and uses, such as answering questions, creating text, and coding. I also need to mention my multilingual support so the user knows I can handle requests in different languages.

Also, according to the guidelines, I need to maintain a human-like personality, so my tone should be friendly, and I might use emojis to add a touch of warmth. I might also need to guide the user to ask further questions or use my functions, for example, by asking them what they need help with.

I need to be careful not to use complex jargon and avoid being long-winded. I will check for any missed key points, such as multilingual support and specific capabilities. I will ensure the answer meets all requirements, including being conversational and concise.
====================Full response====================
Hello! I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I am proficient in multiple languages, including but not limited to Chinese, English, German, French, and Spanish. Is there anything I can help you with?

HTTP

Kode contoh

curl

Untuk model berpikir hibrida, atur enable_thinking ke true untuk mengaktifkan mode berpikir. Parameter enable_thinking tidak berpengaruh pada model qwen3-30b-a3b-thinking-2507, qwen3-235b-a22b-thinking-2507, QwQ, .

# ======= Penting =======
# Kunci API untuk wilayah Singapura dan China (Beijing) berbeda. Untuk mendapatkan kunci API, kunjungi: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Berikut adalah URL untuk wilayah Singapura. Jika Anda menggunakan model di wilayah China (Beijing), ganti URL dengan: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Hapus komentar ini sebelum eksekusi ===

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen-plus",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "result_format": "message"
    }
}'

Contoh respons

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" the user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" is asking","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":" '","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......

id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" feel free to","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" let me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" know","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

Tayang

Performa dan manajemen sumber daya: Dalam layanan backend, mempertahankan koneksi HTTP persisten untuk setiap permintaan streaming mengonsumsi sumber daya. Pastikan layanan Anda dikonfigurasi dengan ukuran kolam koneksi dan periode timeout yang sesuai. Dalam skenario konkurensi tinggi, pantau penggunaan deskriptor file layanan untuk mencegah kehabisan.
Rendering sisi client: Pada antarmuka depan web, gunakan API ReadableStream dan TextDecoderStream untuk memproses dan merender aliran acara SSE secara lancar demi pengalaman pengguna optimal.
Pemantauan penggunaan dan performa:
- Metrik utama: Pantau waktu hingga token pertama (TTFT), metrik inti untuk mengukur pengalaman streaming, bersama dengan tingkat kesalahan API dan waktu respons rata-rata.
- Pengaturan peringatan: Atur peringatan untuk tingkat kesalahan API yang tidak normal, terutama untuk kesalahan 4xx dan 5xx.
Konfigurasi proxy Nginx: Jika Anda menggunakan Nginx sebagai reverse proxy, buffering output default-nya (proxy_buffering) mengganggu respons streaming real-time. Untuk memastikan data didorong ke client segera, atur proxy_buffering off dalam file konfigurasi Nginx untuk menonaktifkan fitur ini.

Kode kesalahan

Jika panggilan gagal, lihat Pesan kesalahan untuk troubleshooting.

FAQ

T: Mengapa data yang dikembalikan tidak menyertakan informasi penggunaan?

J: Secara default, protokol OpenAI tidak mengembalikan informasi penggunaan. Atur parameter stream_options untuk menyertakan informasi penggunaan dalam paket terakhir.

T: Apakah keluaran streaming memengaruhi kualitas respons model?

J: Tidak. Namun, beberapa model hanya mendukung keluaran streaming, dan keluaran non-streaming dapat menyebabkan kesalahan timeout. Kami merekomendasikan penggunaan keluaran streaming.