Qwen モデルのストリーミング出力 - Alibaba Cloud Model Studio

リアルタイムチャットや長文生成などのアプリケーションでは、待ち時間が長いとユーザーエクスペリエンスが低下し、サーバー側のタイムアウトがトリガーされてタスクが失敗する可能性があります。ストリーミング出力は、モデルが生成したテキストのチャンクを継続的に返すことで、これら 2 つの問題を解決します。

仕組み

ストリーミング出力は、Server-Sent Events (SSE) プロトコルに基づいています。ストリーミングリクエストを行うと、サーバーはクライアントとの間に HTTP 持続的接続を確立します。モデルがテキストブロック (チャンクとも呼ばれます) を生成するたびに、そのチャンクを接続経由で即座にプッシュします。すべてのコンテンツが生成されると、サーバーは終了信号を送信します。

クライアントはイベントストリームをリッスンし、チャンクをリアルタイムで受信して処理します。例えば、インターフェイスでテキストを文字単位でレンダリングするなどです。これは、すべてのコンテンツを一度に返す非ストリーミング呼び出しとは対照的です。

チャットコンテナ全体

チャット履歴エリア

ユーザーメッセージスタイル AI メッセージエリア

入力エリアとコントロールエリアの統合

コントロールエリアを入力ボックスの上に移動

⏱️ 待機時間: 3 秒

ストリーム無効

入力ボックスエリア

参考用です。実際のリクエストは送信されません。

このコンポーネントは参考用です。実際のリクエストは送信されません。

課金

ストリーミング出力の課金ルールは、非ストリーミング呼び出しと同じです。課金は、リクエスト内の入力トークンと出力トークンの数に基づきます。

リクエストが中断された場合、サーバーが停止リクエストを受信する前に生成された出力トークンに対してのみ課金されます。

使用方法

重要

一部のモデルはストリーミング呼び出しのみをサポートしています：Qwen3 のオープンソース版、QwQ、QVQ、Qwen-Omni の商用版およびオープンソース版。

ステップ 1: API キーの設定とリージョンの選択

API キーを作成して設定し、環境変数としてエクスポートする必要があります。

API キーをコードにハードコーディングするよりも、環境変数 (DASHSCOPE_API_KEY) として設定する方が安全です。

ステップ 2: ストリーミングリクエストの実行

OpenAI 互換

有効化の方法
stream パラメーターを true に設定します。
トークン使用量の表示
デフォルトでは、OpenAI プロトコルはトークン使用量を返しません。トークン使用量情報を[最終データブロック]に含めるには、stream_options={"include_usage": true} を設定します。

Python

import os
from openai import OpenAI

# 1. 準備: クライアントの初期化。
client = OpenAI(
    # ハードコーディングを避けるため、API キーを環境変数として設定することを推奨します。
    api_key=os.environ["DASHSCOPE_API_KEY"],
    # API キーはリージョン固有です。base_url が API キーのリージョンと一致していることを確認してください。
    # 中国 (北京) リージョンのモデルを使用する場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください。
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# 2. ストリーミングリクエストの実行。
completion = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Introduce yourself."}
    ],
    stream=True,
    stream_options={"include_usage": True}
)

# 3. ストリーミング応答の処理。
# 応答チャンクをリストに格納してから結合する方が、文字列を繰り返し連結するよりも効率的です。
content_parts = []
print("AI: ", end="", flush=True)

for chunk in completion:
    if chunk.choices:
        content = chunk.choices[0].delta.content or ""
        print(content, end="", flush=True)
        content_parts.append(content)
    elif chunk.usage:
        print("\n--- Request Usage ---")
        print(f"Input Tokens: {chunk.usage.prompt_tokens}")
        print(f"Output Tokens: {chunk.usage.completion_tokens}")
        print(f"Total Tokens: {chunk.usage.total_tokens}")

full_response = "".join(content_parts)
# print(f"\n--- Full Response ---\n{full_response}")

戻り値

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to let me know!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 87
Total Tokens: 113

Node.js

import OpenAI from "openai";

async function main() {
    // 1. 準備: クライアントの初期化。
    // ハードコーディングを避けるため、API キーを環境変数として設定することを推奨します。
    if (!process.env.DASHSCOPE_API_KEY) {
        throw new Error("Set the DASHSCOPE_API_KEY environment variable.");
    }
    // API キーはリージョン固有です。baseURL が API キーのリージョンと一致していることを確認してください。
    // 中国 (北京) リージョンのモデルを使用する場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください。
    const client = new OpenAI({
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    });

    try {
        // 2. ストリーミングリクエストの実行。
        const stream = await client.chat.completions.create({
            model: "qwen-plus",
            messages: [
                { role: "system", content: "You are a helpful assistant." },
                { role: "user", content: "Introduce yourself." },
            ],
            stream: true,
            // 目的: 最後のチャンクからこのリクエストのトークン使用量を取得します。
            stream_options: { include_usage: true },
        });

        // 3. ストリーミング応答の処理。
        const contentParts = [];
        process.stdout.write("AI: ");
        
        for await (const chunk of stream) {
            // 最後のチャンクには choices は含まれませんが、使用量情報が含まれています。
            if (chunk.choices && chunk.choices.length > 0) {
                const content = chunk.choices[0]?.delta?.content || "";
                process.stdout.write(content);
                contentParts.push(content);
            } else if (chunk.usage) {
                // リクエストが完了しました。トークン使用量を出力します。
                console.log("\n--- Request Usage ---");
                console.log(`Input Tokens: ${chunk.usage.prompt_tokens}`);
                console.log(`Output Tokens: ${chunk.usage.completion_tokens}`);
                console.log(`Total Tokens: ${chunk.usage.total_tokens}`);
            }
        }
        
        const fullResponse = contentParts.join("");
        // console.log(`\n--- Full Response ---\n${fullResponse}`);

    } catch (error) {
        console.error("Request failed:", error);
    }
}

main();

結果

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 89
Total Tokens: 115

curl

リクエスト

# ======= 重要 =======
# DASHSCOPE_API_KEY 環境変数が設定されていることを確認してください。
# リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
# 中国 (北京) リージョンのモデルを使用する場合は、ベース URL を https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions に置き換えてください。
# === 実行前にこのコメントを削除してください ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
    "model": "qwen-plus",
    "messages": [
        {"role": "user", "content": "Who are you?"}
    ],
    "stream": true,
    "stream_options": {"include_usage": true}
}'

応答

返されたデータは、SSE プロトコルに従ったストリーミング応答です。data: で始まる各行は、データブロックを表します。

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" a"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" large-scale"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" language model from Alibaba"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":" Cloud, and my name is Qwen"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":22,"completion_tokens":17,"total_tokens":39},"created":1726132850,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: [DONE]

data:: メッセージのデータペイロードであり、通常は JSON 形式の文字列です。
[DONE]: ストリーミング応答全体が終了したことを示します。

DashScope

有効化の方法
使用するメソッド (Python SDK、Java SDK、または cURL) に応じて、次のようになります。
- Python SDK: stream パラメーターを True に設定します。
- Java SDK: streamCall インターフェイスを使用してサービスを呼び出します。
- cURL: ヘッダーパラメーター X-DashScope-SSE を enable に設定します。
増分出力の有効化
DashScope プロトコルは、増分および非増分の両方のストリーミング出力をサポートしています。
- 増分 (推奨): 各データチャンクには、新しく生成されたコンテンツのみが含まれます。増分ストリーミングを有効にするには、incremental_output を true に設定します。
  例: ["I love", "to eat", "apples"]
- 非増分: 各データチャンクには、以前に生成されたすべてのコンテンツが含まれます。これにより、ネットワーク帯域幅が無駄になり、クライアントの処理負荷が増加します。非増分ストリーミングを有効にするには、incremental_output を false に設定します。
  例: ["I ", "I like ", "I like apples"]
トークン使用量の表示
各データブロックには、リアルタイムのトークン使用量情報が含まれています。

Python

import os
from http import HTTPStatus
import dashscope
from dashscope import Generation

# 1. 準備: API キーとリージョンを設定します。
# ハードコーディングを避けるため、API キーを環境変数として設定することを推奨します。
try:
    dashscope.api_key = os.environ["DASHSCOPE_API_KEY"]
except KeyError:
    raise ValueError("Set the DASHSCOPE_API_KEY environment variable.")

# API キーはリージョン固有です。base_url が API キーのリージョンと一致していることを確認してください。
# 中国 (北京) リージョンのモデルを使用する場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください。
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

# 2. ストリーミングリクエストの実行。
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Introduce yourself."},
]

try:
    responses = Generation.call(
        model="qwen-plus",
        messages=messages,
        result_format="message",
        stream=True,
        # キー: パフォーマンスを向上させるために、増分出力を取得するには True に設定します。
        incremental_output=True,
    )

    # 3. ストリーミング応答の処理。
    content_parts = []
    print("AI: ", end="", flush=True)

    for resp in responses:
        if resp.status_code == HTTPStatus.OK:
            content = resp.output.choices[0].message.content
            print(content, end="", flush=True)
            content_parts.append(content)

            # これが最後のパケットかどうかを確認します。
            if resp.output.choices[0].finish_reason == "stop":
                usage = resp.usage
                print("\n--- Request Usage ---")
                print(f"Input Tokens: {usage.input_tokens}")
                print(f"Output Tokens: {usage.output_tokens}")
                print(f"Total Tokens: {usage.total_tokens}")
        else:
            # エラーを処理します。
            print(
                f"\nRequest failed: request_id={resp.request_id}, code={resp.code}, message={resp.message}"
            )
            break

    full_response = "".join(content_parts)
    # print(f"\n--- Full Response ---\n{full_response}")

except Exception as e:
    print(f"An unknown error occurred: {e}")

応答

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can help you answer questions and create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 91
Total Tokens: 117

Java

import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import io.reactivex.Flowable;
import io.reactivex.schedulers.Schedulers;

import java.util.Arrays;
import java.util.concurrent.CountDownLatch;
import com.alibaba.dashscope.protocol.Protocol;

public class Main {
    public static void main(String[] args) {
        // 1. API キーを取得します。
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        if (apiKey == null || apiKey.isEmpty()) {
            System.err.println("Set the DASHSCOPE_API_KEY environment variable.");
            return;
        }

        // 2. Generation インスタンスを初期化します。
        // API キーはリージョン固有です。baseUrl が API キーのリージョンと一致していることを確認してください。
        // 中国 (北京) リージョンのモデルを使用する場合は、baseUrl を https://dashscope.aliyuncs.com/api/v1 に置き換えてください。
        Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
        CountDownLatch latch = new CountDownLatch(1);

        // 3. リクエストパラメーターを構築します。
        GenerationParam param = GenerationParam.builder()
                .apiKey(apiKey)
                .model("qwen-plus")
                .messages(Arrays.asList(
                        Message.builder()
                                .role(Role.USER.getValue())
                                .content("Introduce yourself.")
                                .build()
                ))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .incrementalOutput(true) // ストリーミングの増分出力を有効にします。
                .build();
        // 4. ストリーミング呼び出しを行い、応答を処理します。
        try {
            Flowable<GenerationResult> result = gen.streamCall(param);
            StringBuilder fullContent = new StringBuilder();
            System.out.print("AI: ");
            result
                    .subscribeOn(Schedulers.io()) // リクエストは I/O スレッドで実行されます。
                    .observeOn(Schedulers.computation()) // 応答は計算スレッドで処理されます。
                    .subscribe(
                            // onNext: 各応答チャンクを処理します。
                            message -> {
                                String content = message.getOutput().getChoices().get(0).getMessage().getContent();
                                String finishReason = message.getOutput().getChoices().get(0).getFinishReason();
                                // コンテンツを出力します。
                                System.out.print(content);
                                fullContent.append(content);
                                // finishReason が null でない場合、最後のチャンクを示します。使用量情報を出力します。
                                if (finishReason != null && !"null".equals(finishReason)) {
                                    System.out.println("\n--- Request Usage ---");
                                    System.out.println("Input Tokens: " + message.getUsage().getInputTokens());
                                    System.out.println("Output Tokens: " + message.getUsage().getOutputTokens());
                                    System.out.println("Total Tokens: " + message.getUsage().getTotalTokens());
                                }
                                System.out.flush(); // 出力を即座にフラッシュします。
                            },
                            // onError: エラーを処理します。
                            error -> {
                                System.err.println("\nRequest failed: " + error.getMessage());
                                latch.countDown();
                            },
                            // onComplete: 完了時のコールバック。
                            () -> {
                                System.out.println(); // 改行。
                                // System.out.println("Full response: " + fullContent.toString());
                                latch.countDown();
                            }
                    );
            // メインスレッドは非同期タスクが完了するのを待ちます。
            latch.await();
            System.out.println("Program execution finished.");
        } catch (Exception e) {
            System.err.println("Request exception: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

戻り値

AI: Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can help you answer questions and create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to ask me at any time!
--- Request Usage ---
Input Tokens: 26
Output Tokens: 91
Total Tokens: 117

curl

リクエスト

# ======= 重要 =======
# DASHSCOPE_API_KEY 環境変数が設定されていることを確認してください。
# リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
# 中国 (北京) リージョンのモデルを使用する場合は、URL を https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation に置き換えてください。
# === 実行前にこのコメントを削除してください ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen-plus",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters": {
        "result_format": "message",
        "incremental_output":true
    }
}'

応答

応答は SSE 形式に従います。各メッセージには以下が含まれます。

id: データブロック番号。
event: イベントタイプ。常に result です。
HTTP ステータスコード情報。
data: JSON 形式のデータペイロード。

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"I am","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":27,"output_tokens":1,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" Qwen","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":30,"output_tokens":4,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":", an Alibaba","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":33,"output_tokens":7,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

...


id:13
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" or need help, feel free to","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":90,"output_tokens":64,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:14
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":" ask me!","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":92,"output_tokens":66,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

id:15
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":92,"output_tokens":66,"input_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"request_id":"d30a9914-ac97-9102-b746-ce0cb35e3fa2"}

マルチモーダルモデルのストリーミング出力

説明

このセクションは、Qwen-VL、Qwen-VL-OCR、Kimi-K2.5、Qwen3-Omni-Captioner モデルに適用されます。
Qwen-Omni はストリーミング出力のみをサポートしています。その出力にはテキストや音声などのマルチモーダルコンテンツが含まれる可能性があるため、返された結果を解析するロジックは他のモデルとは異なります。詳細については、「オムニモーダル」をご参照ください。

マルチモーダルモデルでは、画像や音声などのコンテンツを会話に追加できます。これらのモデルのストリーミング出力の実装は、テキストモデルとは次の点で異なります。

ユーザーメッセージの構築: マルチモーダルモデルの入力には、テキストに加えて画像や音声などのマルチモーダルコンテンツが含まれます。
DashScope SDK インターフェイス: DashScope Python SDK を使用する場合、MultiModalConversation インターフェイスを呼び出すことができます。DashScope Java SDK を使用する場合、MultiModalConversation クラスを呼び出すことができます。

OpenAI 互換

Python

from openai import OpenAI
import os

client = OpenAI(
    # リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
    # 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えてください: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 中国 (北京) リージョンのモデルを使用する場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください。
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-vl-plus",  # これを他のマルチモーダルモデルに置き換え、それに応じてメッセージを修正できます。
    messages=[
        {"role": "user",
        "content": [{"type": "image_url",
                    "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "What scene is depicted in the image?"}]}],
    stream=True,
  # stream_options={"include_usage": True}
)
full_content = ""
print("Streaming output content:")
for chunk in completion:
    # stream_options.include_usage が True の場合、最後のチャンクの choices フィールドは空のリストであり、スキップする必要があります。トークン使用量は chunk.usage から取得できます。
    if chunk.choices and chunk.choices[0].delta.content != "":
        full_content += chunk.choices[0].delta.content
        print(chunk.choices[0].delta.content)
print(f"Full content: {full_content}")

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
        // 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えてください: apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 中国 (北京) リージョンのモデルを使用する場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください。
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const completion = await openai.chat.completions.create({
    model: "qwen3-vl-plus",  //  これを他のマルチモーダルモデルに置き換え、それに応じてメッセージを修正できます。
    messages: [
        {role: "user",
        content: [{"type": "image_url",
                    "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "What scene is depicted in the image?"}]}],
    stream: true,
    // stream_options: { include_usage: true },
});

let fullContent = ""
console.log("Streaming output content:")
for await (const chunk of completion) {
    // stream_options.include_usage が true の場合、最後のチャンクの choices フィールドは空の配列であり、スキップする必要があります。トークン使用量は chunk.usage から取得できます。
    if (chunk.choices[0] && chunk.choices[0].delta.content != null) {
      fullContent += chunk.choices[0].delta.content;
      console.log(chunk.choices[0].delta.content);
    }
}
console.log(`Full output content: ${fullContent}`)

curl

# ======= 重要 =======
# 中国 (北京) リージョンのモデルを使用する場合は、ベース URL を https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions に置き換えてください。
# リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
# === 実行前にこのコメントを削除してください ===

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "What scene is depicted in the image?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true}
}'

DashScope

Python

import os
from dashscope import MultiModalConversation
import dashscope
# 中国 (北京) リージョンのモデルを使用する場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください。
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
            {"text": "What scene is depicted in the image?"}
        ]
    }
]

responses = MultiModalConversation.call(
    # リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
    # 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えてください: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3-vl-plus',  #  これを他のマルチモーダルモデルに置き換え、それに応じてメッセージを修正できます。
    messages=messages,
    stream=True,
    incremental_output=True)
    
full_content = ""
print("Streaming output content:")
for response in responses:
    if response["output"]["choices"][0]["message"].content:
        print(response.output.choices[0].message.content[0]['text'])
        full_content += response.output.choices[0].message.content[0]['text']
print(f"Full content: {full_content}")

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {
        // 中国 (北京) リージョンのモデルを使用する場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください。
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    public static void streamCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // 可変マップを作成する必要があります。
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "What scene is depicted in the image?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
                // 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えてください: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")  //  これを他のマルチモーダルモデルに置き換え、それに応じてメッセージを修正できます。
                .messages(Arrays.asList(userMessage))
                .incrementalOutput(true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(item -> {
            try {
                List<Map<String, Object>> content = item.getOutput().getChoices().get(0).getMessage().getContent();
                    // コンテンツが存在し、空でないことを確認します。
                if (content != null &&  !content.isEmpty()) {
                    System.out.println(content.get(0).get("text"));
                    }
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= 重要 =======
# リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
# 中国 (北京) リージョンのモデルを使用する場合は、ベース URL を https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation に置き換えてください。
# === 実行前にこのコメントを削除してください ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "What scene is depicted in the image?"}
                ]
            }
        ]
    },
    "parameters": {
        "incremental_output": true
    }
}'

推論モデルのストリーミング出力

思考モデルは、まず reasoning_content (思考プロセス) を返し、次に content (応答) を返します。データパケットのステータスに基づいて、モデルが思考段階か応答段階かを判断できます。

思考モデルの詳細については、「ディープシンキング」、「視覚的理解」、および「視覚的推論」をご参照ください。

Qwen3-Omni-Flash (思考モード) のストリーミング出力を実装するには、「オムニモーダル」をご参照ください。

OpenAI 互換

以下は、OpenAI Python SDK を使用して、ストリーミング出力で思考モードの qwen-plus モデルを呼び出した場合の応答形式です。

# 思考段階
...
ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content='Cover all key points while')
ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content='being natural and fluent.')
# 応答段階
ChoiceDelta(content='Hello! I am **Q', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None)
ChoiceDelta(content='wen** (', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None)
...

もし reasoning_content が None ではなく、かつ content が None の場合、モデルは思考段階にあります。
reasoning_content が None であり、かつ content が None ではない場合、モデルは応答段階にあります。
両方が None の場合、ステージは前のパケットと同じです。

Python

サンプルコード

from openai import OpenAI
import os

# OpenAI クライアントを初期化します。
client = OpenAI(
    # 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えてください: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you?"}]

completion = client.chat.completions.create(
    model="qwen-plus",  # 必要に応じて、これを他のディープシンキングモデルに置き換えることができます。
    messages=messages,
    # enable_thinking パラメーターは思考プロセスを有効にします。このパラメーターは qwen3-30b-a3b-thinking-2507、qwen3-235b-a22b-thinking-2507、および QwQ モデルではサポートされていません。
    extra_body={"enable_thinking": True},
    stream=True,
    # stream_options={
    #     "include_usage": True
    # },
)

reasoning_content = ""  # 完全な思考プロセス
answer_content = ""  # 完全な応答
is_answering = False  # 応答段階が開始されたかどうか
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    if not chunk.choices:
        print("\nUsage:")
        print(chunk.usage)
        continue

    delta = chunk.choices[0].delta

    # 思考コンテンツのみを収集します。
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        if not is_answering:
            print(delta.reasoning_content, end="", flush=True)
        reasoning_content += delta.reasoning_content

    # コンテンツを受信し、応答を開始します。
    if hasattr(delta, "content") and delta.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
            is_answering = True
        print(delta.content, end="", flush=True)
        answer_content += delta.content

戻り値

====================Thinking process====================

Okay, the user is asking "Who are you?". I need to provide an accurate and friendly answer. First, I must confirm my identity: Qwen, developed by the Tongyi Lab at Alibaba Group. Next, I should explain my main functions, such as answering questions, creating text, and logical reasoning. I need to maintain a friendly tone and avoid being too technical to make the user feel at ease. I should also avoid complex jargon to keep the answer simple and clear. Additionally, I might add some interactive elements, inviting the user to ask more questions to encourage further conversation. Finally, I'll check if I've missed any important information, such as my Chinese name "Tongyi Qianwen" and English name "Qwen", and my parent company and lab. I need to ensure the answer is comprehensive and meets the user's expectations.
====================Full response====================

Hello! I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can answer questions, create text, perform logical reasoning, write code, and more, all to provide users with high-quality information and services. You can call me Qwen, or just Tongyi Qianwen. How can I help you?

Node.js

サンプルコード

import OpenAI from "openai";
import process from 'process';

// OpenAI クライアントを初期化します。
const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY, // 環境変数から読み取ります
    baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = '';
let answerContent = '';
let isAnswering = false;

async function main() {
    try {
        const messages = [{ role: 'user', content: 'Who are you?' }];
        const stream = await openai.chat.completions.create({
            // 必要に応じて、これを他の Qwen3 または QwQ モデルに置き換えることができます。
            model: 'qwen-plus',
            messages,
            stream: true,
            // enable_thinking パラメーターは思考プロセスを有効にします。このパラメーターは qwen3-30b-a3b-thinking-2507、qwen3-235b-a22b-thinking-2507、および QwQ モデルではサポートされていません。
            enable_thinking: true
        });
        console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\nUsage:');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;
            
            // 思考コンテンツのみを収集します。
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // コンテンツが受信された後、返信を開始します。
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

戻り値

====================Thinking process====================

Okay, the user is asking "Who are you?". I need to answer with my identity. First, I should clearly state that I am Qwen, a large-scale language model developed by Alibaba Cloud. Next, I can mention my main functions, such as answering questions, creating text, and logical reasoning. I should also emphasize my multilingual support, including Chinese and English, so the user knows I can handle requests in different languages. Additionally, I might need to explain my application scenarios, such as helping with study, work, and daily life. However, the user's question is quite direct, so I should keep it concise. I also need to ensure a friendly tone and invite the user to ask further questions. I will check for any missing important information, such as my version or latest updates, but the user probably doesn't need that level of detail. Finally, I will confirm the answer is accurate and free of errors.
====================Full response====================

I am Qwen, a large-scale language model developed by the Tongyi Lab at Alibaba Group. I can perform various tasks such as answering questions, creating text, logical reasoning, and coding. I support multiple languages, including Chinese and English. If you have any questions or need help, feel free to let me know!

HTTP

サンプルコード

curl

Qwen3 オープンソースモデルでは、思考モードを有効にするには、enable_thinking を true に設定します。enable_thinking パラメーターは、qwen3-30b-a3b-thinking-2507、qwen3-235b-a22b-thinking-2507、QwQ、モデルには効果がありません。

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "enable_thinking": true
}'

戻り値

data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}

.....

data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":10,"completion_tokens":360,"total_tokens":370},"created":1745485391,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-e2edaf2c-8aaf-9e54-90e2-b21dd5045503"}

data: [DONE]

DashScope

以下は、DashScope Python SDK を使用して思考モードの qwen-plus モデルを呼び出す際のデータ形式です。

# 思考段階
...
{"role": "assistant", "content": "", "reasoning_content": "informative, "}
{"role": "assistant", "content": "", "reasoning_content": "so the user finds it helpful."}
# 応答段階
{"role": "assistant", "content": "I am Qwen", "reasoning_content": ""}
{"role": "assistant", "content": ", developed by Tongyi Lab", "reasoning_content": ""}
...

reasoning_content が空の文字列ではなく、かつ content が空の文字列である場合、モデルは思考段階にあります。
reasoning_content が空の文字列で、content が空の文字列でない場合、モデルは応答ステージにあります。
両方が空の文字列の場合、段階は前のパケットと同じです。

Python

サンプルコード

import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1/"

messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えてください: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 必要に応じて、これを他のディープシンキングモデルに置き換えることができます。
    model="qwen-plus",
    messages=messages,
    result_format="message", # オープンソースの Qwen3 モデルは "message" のみをサポートしています。より良いエクスペリエンスのために、他のモデルでもこのパラメーターを "message" に設定することを推奨します。
    # ディープシンキングを有効にします。このパラメーターは qwen3-30b-a3b-thinking-2507、qwen3-235b-a22b-thinking-2507、および QwQ モデルには影響しません。
    enable_thinking=True,
    stream=True,
    incremental_output=True, # オープンソースの Qwen3 モデルは true のみをサポートしています。より良いエクスペリエンスのために、他のモデルでもこのパラメーターを true に設定することを推奨します。
)

# 完全な思考プロセスを定義します。
reasoning_content = ""
# 完全な応答を定義します。
answer_content = ""
# 思考プロセスが完了し、応答が生成されているかどうかを判断します。
is_answering = False

print("=" * 20 + "Thinking process" + "=" * 20)

for chunk in completion:
    # 思考プロセスと応答の両方が空の場合は、何もしません。
    if (
        chunk.output.choices[0].message.content == ""
        and chunk.output.choices[0].message.reasoning_content == ""
    ):
        pass
    else:
        # 現在のパートが思考プロセスの場合。
        if (
            chunk.output.choices[0].message.reasoning_content != ""
            and chunk.output.choices[0].message.content == ""
        ):
            print(chunk.output.choices[0].message.reasoning_content, end="", flush=True)
            reasoning_content += chunk.output.choices[0].message.reasoning_content
        # 現在のパートが応答の場合。
        elif chunk.output.choices[0].message.content != "":
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20)
                is_answering = True
            print(chunk.output.choices[0].message.content, end="", flush=True)
            answer_content += chunk.output.choices[0].message.content

# 完全な思考プロセスと応答を出力するには、次のコードのコメントを解除して実行します。
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")

戻り値

====================Thinking process====================
Okay, the user is asking, "Who are you?" I need to answer this question. First, I must clarify my identity: Qwen, a large-scale language model developed by Alibaba Cloud. Next, I should explain my functions and purposes, such as answering questions, creating text, and logical reasoning. I should also emphasize my goal of being a helpful assistant to users, providing help and support.

When responding, I should maintain a conversational tone and avoid using technical jargon or complex sentence structures. I can use friendly expressions, like "Hello there! ~", to make the conversation more natural. I also need to ensure the information is accurate and does not omit key points, such as my developer, main functions, and application scenarios.

I should also consider potential follow-up questions from the user, such as specific application examples or technical details. So, I can subtly set up opportunities in my answer to guide the user to ask more questions. For example, by mentioning, "Whether it's a question about daily life or a professional field, I can do my best to help," which is both comprehensive and open-ended.

Finally, I will check if the response is fluent, without repetition or redundancy, ensuring it is concise and clear. I will also maintain a balance between being friendly and professional, so the user feels that I am both approachable and reliable.
====================Full response====================
Hello there! ~ I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions, create text, perform logical reasoning, write code, and more, all to provide help and support to users. Whether it's a question about daily life or a professional field, I can do my best to help. How can I assist you?

Java

サンプルコード

// dashscope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();

        if (!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (!content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Full response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えてください: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-plus")
                .enableThinking(true)
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }

    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
//             最終結果を出力します。
//            if (reasoningContent.length() > 0) {
//                System.out.println("\n====================Full response====================");
//                System.out.println(finalContent.toString());
//            }
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

戻り値

====================Thinking process====================
Okay, the user is asking "Who are you?". I need to answer based on my previous settings. First, my role is Qwen, a large-scale language model from Alibaba Group. I need to keep my language conversational, simple, and easy to understand.

The user might be new to me or wants to confirm my identity. I should first directly answer who I am, then briefly explain my functions and uses, such as answering questions, creating text, and coding. I also need to mention my multilingual support so the user knows I can handle requests in different languages.

Also, according to the guidelines, I need to maintain a human-like personality, so my tone should be friendly, and I might use emojis to add a touch of warmth. I might also need to guide the user to ask further questions or use my functions, for example, by asking them what they need help with.

I need to be careful not to use complex jargon and avoid being long-winded. I will check for any missed key points, such as multilingual support and specific capabilities. I will ensure the answer meets all requirements, including being conversational and concise.
====================Full response====================
Hello! I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, express opinions, and play games. I am proficient in multiple languages, including but not limited to Chinese, English, German, French, and Spanish. Is there anything I can help you with?

HTTP

サンプルコード

curl

ハイブリッド思考モデルでは、思考モードを有効にするには、enable_thinking を true に設定します。enable_thinking パラメーターは、qwen3-30b-a3b-thinking-2507、qwen3-235b-a22b-thinking-2507、QwQ、モデルには効果がありません。

# ======= 重要 =======
# リージョンごとに API キーは異なります。API キーを取得するには、https://www.alibabacloud.com/help/model-studio/get-api-key をご参照ください。
# 中国 (北京) リージョンのモデルを使用する場合は、URL を https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation に置き換えてください。
# === 実行前にこのコメントを削除してください ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen-plus",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "result_format": "message"
    }
}'

戻り値

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"Hmm","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":14,"input_tokens":11,"output_tokens":3},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":15,"input_tokens":11,"output_tokens":4},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"user","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":16,"input_tokens":11,"output_tokens":5},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"asks","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":17,"input_tokens":11,"output_tokens":6},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"\"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":18,"input_tokens":11,"output_tokens":7},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}
......

id:358
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"help","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":373,"input_tokens":11,"output_tokens":362},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:359
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":374,"input_tokens":11,"output_tokens":363},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:360
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"welcome","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":375,"input_tokens":11,"output_tokens":364},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:361
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"anytime","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":376,"input_tokens":11,"output_tokens":365},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:362
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"tell me","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":377,"input_tokens":11,"output_tokens":366},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:363
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"!","reasoning_content":"","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

id:364
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","reasoning_content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":378,"input_tokens":11,"output_tokens":367},"request_id":"25d58c29-c47b-9e8d-a0f1-d6c309ec58b1"}

本番環境への適用

パフォーマンスとリソース管理: バックエンドサービスでは、各ストリーミングリクエストに対して HTTP 持続的接続を維持するとリソースを消費します。ご利用のサービスが適切な接続プールサイズとタイムアウト期間で構成されていることを確認してください。高同時実行シナリオでは、サービスのファイル記述子の使用状況をモニターして、枯渇を防ぎます。
クライアントサイドレンダリング：Web フロントエンドでは、ReadableStream API と TextDecoderStream API を使用して SSE イベントストリームをスムーズに処理およびレンダリングすることで、最適なユーザーエクスペリエンスを実現します。
モデルのモニタリング:
- 主要メトリクス: ストリーミングエクスペリエンスを測定するためのコアメトリクスである最初のトークンまでの時間 (TTFT) を、API エラー率および平均応答時間とともにモニターします。
- アラート設定: 異常な API エラー率、特に 4xx および 5xx エラーに対してアラートを設定します。
Nginx プロキシの構成: Nginx をリバースプロキシとして使用する場合、デフォルトの出力バッファリング (proxy_buffering) によりリアルタイムストリーミング応答が中断されます。データをクライアントに即座にプッシュするには、この機能を無効にするために Nginx 構成ファイルで proxy_buffering off を設定します。

エラーコード

呼び出しが失敗した場合は、「エラーメッセージ」を参照してトラブルシューティングを行ってください。

よくある質問

Q: 返されたデータに usage 情報が含まれないのはなぜですか？

A: デフォルトでは、OpenAI プロトコルでは使用量情報は返されません。stream_options パラメーターを設定すると、最終パケットに使用量情報を含めることができます。

Q: ストリーミング出力はモデルの応答品質に影響しますか？

A: いいえ、影響しません。ただし、一部のモデルはストリーミング出力のみをサポートしており、非ストリーミング出力ではタイムアウトエラーが発生する可能性があります。ストリーミング出力を使用することを推奨します。