視覚的推論モデルの使用 - - Alibaba Cloud ドキュメントセンター

視覚的推論モデルは、まず思考プロセスを出力し、その後回答を提供します。これにより、数学の問題解決、チャートデータの分析、複雑な動画の理解など、複雑な視覚分析タスクに適しています。

ショーケース

現在の位置から見て、画像内で最も遠いオブジェクトはどれですか？ オプション: A. 椅子 B. 壁の絵 C. コーヒーテーブル D. ソファ。正しい回答の文字のみを出力してください (例: A)。

上記のコンポーネントはデモンストレーション目的のみであり、実際のリクエストは送信されません。

可用性

サポートされているリージョン

シンガポール: このリージョンの API キーを使用します。
米国 (バージニア): このリージョンの API キーを使用します。
中国 (北京): このリージョンの API キーを使用します。

サポートされているモデル

グローバル

グローバルデプロイメントモードでは、エンドポイントとデータストレージは米国 (バージニア) リージョンに配置され、モデル推論計算リソースは世界中で動的にスケジュールされます。

ハイブリッド思考モデル: qwen3-vl-plus, qwen3-vl-plus-2025-09-23, qwen3-vl-flash, qwen3-vl-flash-2025-10-15
思考専用モデル: qwen3-vl-235b-a22b-thinking, qwen3-vl-32b-thinking, qwen3-vl-30b-a3b-thinking, qwen3-vl-8b-thinking

インターナショナル

インターナショナルデプロイメントモードでは、エンドポイントとデータストレージはシンガポールリージョンに配置され、モデル推論計算リソースはグローバルに (中国本土を除く) 動的にスケジュールされます。

Qwen3-VL
- ハイブリッド思考モデル: qwen3-vl-plus, qwen3-vl-plus-2025-12-19, qwen3-vl-plus-2025-09-23, qwen3-vl-flash, qwen3-vl-flash-2025-10-15
- 思考専用モデル: qwen3-vl-235b-a22b-thinking, qwen3-vl-32b-thinking, qwen3-vl-30b-a3b-thinking, qwen3-vl-8b-thinking
QVQ
思考専用モデル: qvq-max series, qvq-plus series

米国

米国デプロイメントモードでは、エンドポイントとデータストレージは米国 (バージニア) リージョンに配置され、モデル推論計算リソースは米国に限定されます。

ハイブリッド思考モデル: qwen3-vl-flash-us, qwen3-vl-flash-2025-10-15-us

中国本土

中国本土のデプロイメントモードでは、エンドポイントとデータストレージは中国 (北京) リージョンに配置され、モデル推論計算リソースは中国本土に限定されます。

Qwen3-VL
- ハイブリッド思考モデル: qwen3-vl-plus, qwen3-vl-plus-2025-12-19, qwen3-vl-plus-2025-09-23, qwen3-vl-flash, qwen3-vl-flash-2025-10-15
- 思考専用モデル: qwen3-vl-235b-a22b-thinking, qwen3-vl-32b-thinking, qwen3-vl-30b-a3b-thinking, qwen3-vl-8b-thinking
QVQ
思考専用モデル: qvq-max series, qvq-plus series
Kimi
ハイブリッド思考モデル: kimi-k2.5

使用ガイド

思考プロセス: Model Studio は、ハイブリッド思考モデルと思考専用モデルの 2 種類の視覚的推論モデルを提供しています。
- ハイブリッド思考モデル: enable_thinking パラメーターを使用して、思考動作を制御できます。
  - true に設定すると、思考が有効になります。モデルはまず思考プロセスを出力し、その後最終応答を出力します。
  - false に設定すると、思考が無効になります。モデルは直接応答を生成します。
- 思考専用モデル: これらのモデルは、応答を提供する前に常に思考プロセスを生成し、この動作を無効にすることはできません。
出力方法: 視覚的推論モデルには詳細な思考プロセスが含まれるため、長い応答によるタイムアウトを防ぐためにストリーミング出力を使用することを推奨します。
- Qwen3-VL と kimi-k2.5 は、ストリーミングと非ストリーミングの両方のメソッドをサポートしています。
- QVQ シリーズは、ストリーミング出力のみをサポートしています。
システムプロンプトの推奨事項:
- 単一ターンまたはシンプルな会話の場合: 最良の推論結果を得るには、System Message を設定しないでください。モデルの役割設定や出力形式の要件などの命令は、User Message を介して渡します。
- エージェントの構築やツール呼び出しの実装などの複雑なアプリケーションの場合: System Message を使用して、モデルの役割、機能、および行動フレームワークを定義し、その安定性と信頼性を確保します。

使用開始

前提条件

API キーを作成し、API キーを環境変数としてエクスポート済みであること。
SDK を使用してモデルを呼び出す場合は、SDK の最新バージョンをインストールします。DashScope Python SDK はバージョン 1.24.6 以降、DashScope Java SDK はバージョン 2.21.10 以降である必要があります。

以下の例は、qvq-max モデルを呼び出して画像から数学の問題を解決する方法を示しています。これらの例では、ストリーミング出力を使用して思考プロセスと最終応答を個別に表示します。

OpenAI 互換

Python

from openai import OpenAI
import os

# Initialize the OpenAI client
client = OpenAI(
    # API キーはリージョンによって異なります。取得するには、https://bailian.console.alibabacloud.com/?tab=model#/api-key を参照してください。
    # 環境変数を設定していない場合は、次の行を Model Studio API キーに置き換えます: api_key="sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY"),
    # 以下はシンガポールリージョンのベース URL です。中国 (北京) リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください。       
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

reasoning_content = ""  # 完全な思考プロセスを定義します
answer_content = ""     # 完全な応答を定義します
is_answering = False   # 思考プロセスが終了し、応答が開始されたかどうかを確認します

# チャット完了リクエストを作成します
completion = client.chat.completions.create(
    model="qvq-max",  # この例では qvq-max を使用しています。必要に応じて別のモデル名に置き換えることができます。
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
                    },
                },
                {"type": "text", "text": "How do I solve this problem?"},
            ],
        },
    ],
    stream=True,
    # 最後のチャンクでトークン使用量を返すには、次のコメントを解除します
    # stream_options={
    #     "include_usage": True
    # }
)

print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    # chunk.choices が空の場合、使用量を出力します
    if not chunk.choices:
        print("\nUsage:")
        print(chunk.usage)
    else:
        delta = chunk.choices[0].delta
        # 思考プロセスを出力します
        if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
            print(delta.reasoning_content, end='', flush=True)
            reasoning_content += delta.reasoning_content
        else:
            # 応答を開始します
            if delta.content != "" and is_answering is False:
                print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
                is_answering = True
            # 応答プロセスを出力します
            print(delta.content, end='', flush=True)
            answer_content += delta.content

# print("=" * 20 + "完全な思考プロセス" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "完全な応答" + "=" * 20 + "\n")
# print(answer_content)

Node.js

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY, // 環境変数から読み込みます。API キーはリージョンによって異なります。取得するには、https://bailian.console.alibabacloud.com/?tab=model#/api-key を参照してください。
    // 以下はシンガポールリージョンのベース URL です。中国 (北京) リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください。       
    baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = '';
let answerContent = '';
let isAnswering = false;

let messages = [
    {
        role: "user",
        content: [
        { type: "image_url", image_url: { "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg" } },
        { type: "text", text: "Solve this problem" },
    ]
}]

async function main() {
    try {
        const stream = await openai.chat.completions.create({
            model: 'qvq-max',
            messages: messages,
            stream: true
        });

        console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\nUsage:');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;

            // 思考プロセスを処理します
            if (delta.reasoning_content) {
                process.stdout.write(delta.reasoning_content);
                reasoningContent += delta.reasoning_content;
            }
            // 正式な応答を処理します
            else if (delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

思考プロセスと完全な応答を表示するにはクリック

====================Thinking process====================

Okay, I need to solve this problem about the surface area and volume of a rectangular prism and a cube. First, I need to carefully look at the problem to make sure I understand the requirements for each part.

The problem asks to calculate the surface area and volume of two shapes separately. The first shape is a rectangular prism, and the second is a cube. The units are all in centimeters, and each question is worth 6 points, for a total of 12 points. This looks like a math homework problem, possibly from elementary or middle school geometry.

First, I need to recall the formulas for the surface area and volume of a rectangular prism and a cube. For a rectangular prism, the formula for surface area is: \( 2(ab + bc + ac) \), where a, b, and c are the length, width, and height, respectively. The volume is \( abc \). For a cube, because all side lengths are equal, the surface area is \( 6a^2 \), and the volume is \( a^3 \).

Now, let's look at the first shape, a rectangular prism with given dimensions of length 4 cm, width 3 cm, and height 2 cm. I need to calculate its surface area first. Plugging the values into the formula:

Surface Area = \( 2(4×3 + 3×2 + 4×2) \)
        = \( 2(12 + 6 + 8) \)
        = \( 2×26 \)
        = 52 cm²

Next is the volume, again plugging in the values:

Volume = \( 4×3×2 \)
      = 24 cm³

Then, for the second shape, a cube with a side length of 3 cm. The surface area is calculated as follows:

Surface Area = \( 6×3^2 \)
        = \( 6×9 \)
        = 54 cm²

And the volume is:

Volume = \( 3^3 \)
      = 27 cm³

However, I need to check for any potential mistakes. For example, in calculating the surface area of the rectangular prism, did I miscalculate the area of any face? Or did I make a mistake in the multiplication?

For instance, the three different faces of the rectangular prism are 4×3, 3×2, and 4×2. The areas of these faces are 12, 6, and 8. Adding them up gives 26, and multiplying by 2 gives 52, which should be correct. For the volume, 4×3=12, and then multiplying by 2 is 24, which is also correct.

For the cube, each face is 3×3=9, and with six faces, the total is 54. The volume is 3×3×3=27, which is also correct.

So the final answers should be:

1. The surface area of the rectangular prism is 52 square centimeters, and the volume is 24 cubic centimeters.
2. The surface area of the cube is 54 square centimeters, and the volume is 27 cubic centimeters.

However, I need to confirm if there is any other hidden information or potential traps in the figures in the problem. For example, sometimes problems might include dashed lines or shaded areas, but in this case, the rectangular prism and cube in the image look standard, with no special markings. Therefore, I believe my calculations are correct.

====================Full response====================

### Solution:

**1. Rectangular Prism (length 4 cm, width 3 cm, height 2 cm)**
- **Surface Area**:
  \[
  2 \times (4 \times 3 + 3 \times 2 + 4 \times 2) = 2 \times (12 + 6 + 8) = 2 \times 26 = 52 \, \text{cm}^2
  \]
- **Volume**:
  \[
  4 \times 3 \times 2 = 24 \, \text{cm}^3
  \]

**2. Cube (side length 3 cm)**
- **Surface Area**:
  \[
  6 \times 3^2 = 6 \times 9 = 54 \, \text{cm}^2
  \]
- **Volume**:
  \[
  3^3 = 27 \, \text{cm}^3
  \]

**Answer:**
1. The surface area of the rectangular prism is \(52 \, \text{cm}^2\), and its volume is \(24 \, \text{cm}^3\).
2. The surface area of the cube is \(54 \, \text{cm}^2\), and its volume is \(27 \, \text{cm}^3\).

HTTP

# ======= 重要 =======
# 以下はシンガポールリージョンのベース URL です。中国 (北京) リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions に置き換えてください。    
# API キーはリージョンによって異なります。取得するには、https://bailian.console.alibabacloud.com/?tab=model#/api-key を参照してください。
# === 実行前にこのコメントを削除してください ===

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qvq-max",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
          }
        },
        {
          "type": "text",
          "text": "Solve this problem"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true}
}'

思考プロセスと完全な応答を表示するにはクリック

data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1742983020,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-ab4f3963-2c2a-9291-bda2-65d5b325f435"}

data: {"choices":[{"finish_reason":null,"delta":{"content":null,"reasoning_content":"Okay"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983020,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-ab4f3963-2c2a-9291-bda2-65d5b325f435"}

data: {"choices":[{"delta":{"content":null,"reasoning_content":","},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983020,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-ab4f3963-2c2a-9291-bda2-65d5b325f435"}

data: {"choices":[{"delta":{"content":null,"reasoning_content":" I am now"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983020,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-ab4f3963-2c2a-9291-bda2-65d5b325f435"}

data: {"choices":[{"delta":{"content":null,"reasoning_content":" going to"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983020,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-ab4f3963-2c2a-9291-bda2-65d5b325f435"}

data: {"choices":[{"delta":{"content":null,"reasoning_content":" solve"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983020,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-ab4f3963-2c2a-9291-bda2-65d5b325f435"}
.....
data: {"choices":[{"delta":{"content":"square "},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983095,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-23d30959-42b4-9f24-b7ab-1bb0f72ce265"}

data: {"choices":[{"delta":{"content":"centimeters"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983095,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-23d30959-42b4-9f24-b7ab-1bb0f72ce265"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1742983095,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-23d30959-42b4-9f24-b7ab-1bb0f72ce265"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":544,"completion_tokens":590,"total_tokens":1134,"completion_tokens_details":{"text_tokens":590},"prompt_tokens_details":{"text_tokens":24,"image_tokens":520}},"created":1742983095,"system_fingerprint":null,"model":"qvq-max","id":"chatcmpl-23d30959-42b4-9f24-b7ab-1bb0f72ce265"}

data: [DONE]

DashScope

説明

DashScope を使用して QVQ モデルを呼び出す場合：

incremental_output パラメーターのデフォルト値は true であり、false に設定することはできません。増分ストリーミング出力のみがサポートされています。
result_format パラメーターのデフォルト値は "message" であり、"text" に設定することはできません。

Python

import os
import dashscope
from dashscope import MultiModalConversation

# 以下はシンガポールリージョンのベース URL です。北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください      
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
            {"text": "How do I solve this problem?"}
        ]
    }
]

response = MultiModalConversation.call(
    # API キーはリージョンによって異なります。取得するには、https://bailian.console.alibabacloud.com/?tab=model#/api-key をご参照ください
    # 環境変数が設定されていない場合は、次の行をご利用の Model Studio API キーに置き換えます： api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model="qvq-max",  # この例では qvq-max を使用します。必要に応じて、別のモデル名に置き換えることができます。
    messages=messages,
    stream=True,
)

# 完全な思考プロセスを定義します
reasoning_content = ""
# 完全な応答を定義します
answer_content = ""
# 思考プロセスが終了し、応答が開始されたかどうかを確認します
is_answering = False

print("=" * 20 + "Thinking process" + "=" * 20)

for chunk in response:
    # 思考プロセスと応答の両方が空の場合は無視します
    message = chunk.output.choices[0].message
    reasoning_content_chunk = message.get("reasoning_content", None)
    if (chunk.output.choices[0].message.content == [] and
        reasoning_content_chunk == ""):
        pass
    else:
        # 現在が思考プロセスの場合
        if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
            print(chunk.output.choices[0].message.reasoning_content, end="")
            reasoning_content += chunk.output.choices[0].message.reasoning_content
        # 現在が応答の場合
        elif chunk.output.choices[0].message.content != []:
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20)
                is_answering = True
            print(chunk.output.choices[0].message.content[0]["text"], end="")
            answer_content += chunk.output.choices[0].message.content[0]["text"]

# 完全な思考プロセスと応答を印刷するには、次のコードのコメントを解除して実行します
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")

クリックして思考プロセスと完全な応答を表示

====================思考プロセス====================
はい、この直方体と立方体の表面積と体積に関する問題を解く必要があります。まず、各部分の要件を確実に理解するために、問題を注意深く確認する必要があります。

問題では、2つの図形の表面積と体積をそれぞれ計算するように求められています。最初の図形は直方体で、2番目の図形は立方体です。単位はすべてセンチメートルで、各問題は 6 点、合計 12 点です。これは、小学校か中学校の幾何学の数学の宿題のようです。

まず、直方体と立方体の表面積と体積の数式を思い出す必要があります。直方体の場合、表面積の数式は \( Area = 2(ab + bc + ac) \) で、a、b、c はそれぞれ長さ、幅、高さです。体積は \( Volume = abc \) です。立方体の場合、すべての辺の長さが等しいため、表面積は \( Area = 6a^2 \) で、体積は \( Volume = a^3 \) です。

では、最初の図形、長さ 4 cm、幅 3 cm、高さ 2 cm の直方体を見てみましょう。まず、これらの値が数式の変数に正しく対応していることを確認する必要があります。通常、直方体の3つのディメンションは任意に名付けることができますが、便宜上、最も長い辺を長さ、中間の辺を幅、最も短い辺を高さと見なすことができます。ただし、この問題では、各辺の長さが明確にラベル付けされているため、直接使用できます。

次に、最初の直方体の表面積を計算します。値を数式に代入します：

\( Area = 2(4×3 + 3×2 + 4×2) \)

まず、括弧内の各項を計算します：

\( 4×3 = 12 \)
\( 3×2 = 6 \)
\( 4×2 = 8 \)

次に、これらの結果を足し合わせます：

\( 12 + 6 + 8 = 26 \)

そして 2 を掛けます：

\( 2 × 26 = 52 \)

したがって、最初の直方体の表面積は 52 平方センチメートルです。

次に、体積を計算します：

\( Volume = 4 × 3 × 2 = 24 \)

したがって、体積は 24 立方センチメートルです。

次に、2番目の図形、すべての辺の長さが 3 cm の立方体を見てみましょう。したがって、表面積の計算は次のようになります：

\( Area = 6 × 3^2 = 6 × 9 = 54 \)

そして体積は：

\( Volume = 3^3 = 27 \)

したがって、立方体の表面積は 54 平方センチメートル、体積は 27 立方センチメートルです。

計算中は、単位の一貫性に注意する必要があります。問題で与えられた単位はセンチメートルなので、最終結果は平方センチメートルと立方センチメートルでなければなりません。また、特に直方体の表面積を計算する際には、項を見逃したり計算ミスをしたりしやすいため、乗算と加算の順序など、計算エラーがないことを確認する必要があります。

さらに、長さ、幅、高さが正しく識別されているかなど、他の誤解の可能性がないか確認します。しかし、この問題では各辺の長さが明確にラベル付けされているため、これは問題にはならないはずです。また、立方体については、すべての辺の長さが等しいため、辺の長さが異なることによる複雑さはありません。

要約すると、最初の直方体の表面積は 52 平方センチメートル、体積は 24 立方センチメートルです。2番目の立方体の表面積は 54 平方センチメートル、体積は 27 立方センチメートルです。

====================完全な応答====================
### 解法：

**1. 直方体 (長さ 4 cm、幅 3 cm、高さ 2 cm)**

- **表面積**：
  \[
  Area = 2(ab + bc + ac) = 2(4×3 + 3×2 + 4×2) = 2(12 + 6 + 8) = 2×26 = 52 \, \text{cm}^2
  \]

- **体積**：
  \[
  Volume = abc = 4×3×2 = 24 \, \text{cm}^3
  \]

**2. 立方体 (辺の長さ 3 cm)**

- **表面積**：
  \[
  Area = 6a^2 = 6×3^2 = 6×9 = 54 \, \text{cm}^2
  \]

- **体積**：
  \[
  Volume = a^3 = 3^3 = 27 \, \text{cm}^3
  \]

**答え：**
1. 直方体の表面積は \(52 \, \text{cm}^2\) で、体積は \(24 \, \text{cm}^3\) です。
2. 立方体の表面積は \(54 \, \text{cm}^2\) で、体積は \(27 \, \text{cm}^3\) です。

Java

// DashScope SDK バージョン >= 2.19.0
import java.util.*;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {
       // 以下はシンガポールリージョンのベース URL です。北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください      
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(MultiModalConversationResult message) {
        String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String reasoning = Objects.isNull(re)?"":re; // デフォルト値

        List<Map<String, Object>> content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (Objects.nonNull(content) && !content.isEmpty()) {
            Object text = content.get(0).get("text");
            finalContent.append(content.get(0).get("text"));
            if (!isFirstPrint) {
                System.out.println("\n====================Full response====================");
                isFirstPrint = true;
            }
            System.out.print(text);
        }
    }
    public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg)  {
        return MultiModalConversationParam.builder()
                // API キーはリージョンによって異なります。取得するには、https://bailian.console.alibabacloud.com/?tab=model#/api-key をご参照ください
                // 環境変数を設定していない場合は、次の行をご利用の Model Studio API キーに置き換えます： .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // この例では qvq-max を使用します。必要に応じて、別のモデル名に置き換えることができます。
                .model("qvq-max")
                .messages(Arrays.asList(Msg))
                .incrementalOutput(true)
                .build();
    }

    public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
            throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
        MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(message -> {
            handleGenerationResult(message);
        });
    }
    public static void main(String[] args) {
        try {
            MultiModalConversation conv = new MultiModalConversation();
            MultiModalMessage userMsg = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(Collections.singletonMap("image", "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"),
                            Collections.singletonMap("text", "Solve this problem")))
                    .build();
            streamCallWithMessage(conv, userMsg);
//             最終結果を印刷します
//            if (reasoningContent.length() > 0) {
//                System.out.println("\n====================Full response====================");
//                System.out.println(finalContent.toString());
//            }
        } catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

クリックして思考プロセスと完全な応答を表示

====================思考プロセス====================
うーん、この問題を解く必要があります。2つの図形の表面積と体積を計算する問題ですね。まず、問題で提供されている画像を注意深く確認する必要があります。最初の図形は直方体で、2番目は立方体です。問題では、単位をセンチメートルとして、それぞれの表面積と体積を計算するように求められています。

最初の図形、直方体を見てみましょう。そのディメンションは、長さ、幅、高さである必要があります。画像のラベルによると、長さは 4 cm、幅は 3 cm、高さは 2 cm です。そうですよね？直方体の表面積の数式は 2 × (長さ × 幅 + 長さ × 高さ + 幅 × 高さ) だったと思います。そして、体積は長さ × 幅 × 高さです。数式が正しいか再確認してみましょう。はい、表面積は確かに6つの面の面積の合計であり、向かい合う面の面積は等しいので、この数式は正しいです。

では、値を代入すると、表面積は 2×(4×3 + 4×2 + 3×2) になります。まず、括弧内の項を計算します：4×3=12、4×2=8、3×2=6。これらを足すと 12+8+6=26 になります。次に 2 を掛けると 52 平方センチメートルになります。体積については、4×3×2=24 立方センチメートルです。この部分は正しいはずです。

次は2番目の図形、立方体です。すべての辺の長さは 3 cm です。立方体は6つの同一の正方形の面を持つため、その表面積は辺の長さの2乗の6倍です。体積は辺の長さの3乗です。したがって、表面積は 6×3²=6×9=54 平方センチメートルになるはずです。体積は 3³=27 立方センチメートルです。ここで単位に注意する必要があります。問題では単位が cm と明記されているので、結果は平方センチメートルと立方センチメートルで記述する必要があります。

ただし、何か間違いを犯していないか再確認すべきです。例えば、直方体の辺は正しく識別されていますか？画像では、直方体の長さは幅よりも長く見えるので、長さは 4、幅は 3、高さは 2 です。立方体については、3つのディメンションすべてが 3 であり、問題ありません。計算ミスはありますか？例えば、直方体の表面積の計算で、積は正しいか、加算は正しいか？例えば、4×3=12、4×2=8、3×2=6、足して 26、2 を掛けて 52、これは正しいです。体積 4×3×2=24 も正しいです。立方体については、表面積 6×9=54 と体積 27 も正しいです。

注意すべき点の一つは単位です。問題では単位が cm と明確に述べられているので、答えに正しい単位記号を追加すべきです。また、問題には各問題が 6 点、合計 12 点と記載されていますが、問題は2つしかないので、それぞれ 6 点です。これは計算プロセスには影響しませんが、ステップや単位を見逃さないようにという注意喚起になります。

要約すると、最初の図形の表面積は 52 平方センチメートル、体積は 24 立方センチメートルです。2番目の図形の表面積は 54 平方センチメートル、体積は 27 立方センチメートルです。これでいいはずです。

====================完全な応答====================
**答え：**

1. **直方体**  
   - **表面積**: \(2 \times (4 \times 3 + 4 \times 2 + 3 \times 2) = 2 \times 26 = 52\) 平方センチメートル  
   - **体積**: \(4 \times 3 \times 2 = 24\) 立方センチメートル  

2. **立方体**  
   - **表面積**: \(6 \times 3^2 = 6 \times 9 = 54\) 平方センチメートル  
   - **体積**: \(3^3 = 27\) 立方センチメートル  

**説明：**  
- 直方体の表面積は、その6つの面の総面積を計算することで得られ、体積は長さ、幅、高さの積です。  
- 立方体の表面積は、その6つの同一の正方形の面の面積の合計であり、体積はその辺の長さの3乗です。  
- すべての単位は、問題の要件に従ってセンチメートルです。

HTTP

curl

# ======= 重要 =======
# 以下はシンガポールリージョンのベース URL です。北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation に置き換えてください       
# API キーはリージョンによって異なります。取得するには、https://bailian.console.alibabacloud.com/?tab=model#/api-key をご参照ください
# === 実行前にこのコメントを削除してください ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qvq-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
                    {"text": "Solve this problem"}
                ]
            }
        ]
    }
}'

クリックして思考プロセスと完全な応答を表示

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"reasoning_content":"Okay","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":547,"input_tokens_details":{"image_tokens":520,"text_tokens":24},"output_tokens":3,"input_tokens":544,"output_tokens_details":{"text_tokens":3},"image_tokens":520},"request_id":"f361ae45-fbef-9387-9f35-1269780e0864"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"reasoning_content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":548,"input_tokens_details":{"image_tokens":520,"text_tokens":24},"output_tokens":4,"input_tokens":544,"output_tokens_details":{"text_tokens":4},"image_tokens":520},"request_id":"f361ae45-fbef-9387-9f35-1269780e0864"}

id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"reasoning_content":" I am now","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":549,"input_tokens_details":{"image_tokens":520,"text_tokens":24},"output_tokens":5,"input_tokens":544,"output_tokens_details":{"text_tokens":5},"image_tokens":520},"request_id":"f361ae45-fbef-9387-9f35-1269780e0864"}
.....
id:566
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"square"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":1132,"input_tokens_details":{"image_tokens":520,"text_tokens":24},"output_tokens":588,"input_tokens":544,"output_tokens_details":{"text_tokens":588},"image_tokens":520},"request_id":"758b0356-653b-98ac-b4d3-f812437ba1ec"}

id:567
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"centimeters"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":1133,"input_tokens_details":{"image_tokens":520,"text_tokens":24},"output_tokens":589,"input_tokens":544,"output_tokens_details":{"text_tokens":589},"image_tokens":520},"request_id":"758b0356-653b-98ac-b4d3-f812437ba1ec"}

id:568
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":1134,"input_tokens_details":{"image_tokens":520,"text_tokens":24},"output_tokens":590,"input_tokens":544,"output_tokens_details":{"text_tokens":590},"image_tokens":520},"request_id":"758b0356-653b-98ac-b4d3-f812437ba1ec"}

コア機能

思考プロセスの有効化/無効化

問題解決やレポート分析など、詳細な思考プロセスが必要なシナリオでは、enable_thinking パラメーターを使用して思考モードを有効にできます。次の例でその方法を示します。

重要

enable_thinking パラメーターは、qwen3-vl-plus、qwen3-vl-flash、および kimi-k2.5 シリーズでのみサポートされています。

OpenAI 互換

enable_thinking および thinking_budget パラメーターは、標準の OpenAI パラメーターではありません。これらのパラメーターを渡す方法は、プログラミング言語によって異なります：

Python SDK：extra_body 辞書を介して渡す必要があります。
Node.js SDK：トップレベルパラメーターとして直接渡すことができます。

Python

import os
from openai import OpenAI

client = OpenAI(
    # API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/compatible-mode/v1 に置き換えてください
    # 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

reasoning_content = ""  # 完全な思考プロセスを定義します
answer_content = ""     # 完全な応答を定義します
is_answering = False   # 思考プロセスが終了し、応答が開始されたかどうかを確認します
enable_thinking = True
# チャット補完リクエストを作成します
completion = client.chat.completions.create(
    model="qwen3-vl-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
                    },
                },
                {"type": "text", "text": "この問題をどう解けばいいですか？"},
            ],
        },
    ],
    stream=True,
    # enable_thinking パラメーターは思考プロセスを有効にします。thinking_budget パラメーターは推論プロセスの最大トークン数を設定します。
    # qwen3-vl-plus および qwen3-vl-flash の場合、enable_thinking を使用して思考を有効または無効にできます。「thinking」サフィックスを持つモデル (qwen3-vl-235b-a22b-thinking など) の場合、enable_thinking は true にしか設定できません。このパラメーターは他の Qwen-VL モデルには適用されません。
    extra_body={
        'enable_thinking': enable_thinking
        },

    # 最後のチャンクでトークン使用量を返すには、以下のコメントを解除します
    # stream_options={
    #     "include_usage": True
    # }
)

if enable_thinking:
    print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    # chunk.choices が空の場合、使用量を出力します
    if not chunk.choices:
        print("\nUsage:")
        print(chunk.usage)
    else:
        delta = chunk.choices[0].delta
        # 思考プロセスを出力します
        if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
            print(delta.reasoning_content, end='', flush=True)
            reasoning_content += delta.reasoning_content
        else:
            # 応答を開始します
            if delta.content != "" and is_answering is False:
                print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
                is_answering = True
            # 応答プロセスを出力します
            print(delta.content, end='', flush=True)
            answer_content += delta.content

# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(answer_content)

Node.js

import OpenAI from "openai";

// OpenAI クライアントを初期化します
const openai = new OpenAI({
  // API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
  // 環境変数を設定していない場合は、次の行を Model Studio API キーに置き換えます： apiKey: "sk-xxx"
  apiKey: process.env.DASHSCOPE_API_KEY,
 // 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/compatible-mode/v1 に置き換えてください
 //  北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});

let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let enableThinking = true;

let messages = [
    {
        role: "user",
        content: [
        { type: "image_url", image_url: { "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg" } },
        { type: "text", text: "この問題を解いてください" },
    ]
}]

async function main() {
    try {
        const stream = await openai.chat.completions.create({
            model: 'qwen3-vl-plus',
            messages: messages,
            stream: true,
          // 注：Node.js SDK では、enableThinking のような非標準パラメーターはトップレベルのプロパティとして渡され、extra_body に含める必要はありません。
          enable_thinking: enableThinking

        });

        if (enableThinking){console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');}

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\nUsage:');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;

            // 思考プロセスを処理します
            if (delta.reasoning_content) {
                process.stdout.write(delta.reasoning_content);
                reasoningContent += delta.reasoning_content;
            }
            // 正式な応答を処理します
            else if (delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

curl

# ======= 重要 =======
# 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions に置き換えてください
# 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions に置き換えてください
# API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
# === 実行前にこのコメントを削除してください ===

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
          }
        },
        {
          "type": "text",
          "text": "この問題を解いてください"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true},
    "enable_thinking": true
}'

DashScope

Python

import os
import dashscope
from dashscope import MultiModalConversation

# 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/api/v1 に置き換えてください
# 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

enable_thinking = True

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
            {"text": "この問題をどう解けばいいですか？"}
        ]
    }
]

response = MultiModalConversation.call(
    # 環境変数を設定していない場合は、次の行を Model Studio API キーに置き換えます： api_key="sk-xxx",
    # API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model="qwen3-vl-plus",  
    messages=messages,
    stream=True,
    # enable_thinking パラメーターは思考プロセスを有効にします。
    # qwen3-vl-plus および qwen3-vl-flash の場合、enable_thinking を使用して思考を有効または無効にできます。「thinking」サフィックスを持つモデル (qwen3-vl-235b-a22b-thinking など) の場合、enable_thinking は true にしか設定できません。このパラメーターは他の Qwen-VL モデルには適用されません。
    enable_thinking=enable_thinking

)

# 完全な思考プロセスを定義します
reasoning_content = ""
# 完全な応答を定義します
answer_content = ""
# 思考プロセスが終了し、応答が開始されたかどうかを確認します
is_answering = False

if enable_thinking:
    print("=" * 20 + "Thinking process" + "=" * 20)

for chunk in response:
    # 思考プロセスと応答の両方が空の場合は無視します
    message = chunk.output.choices[0].message
    reasoning_content_chunk = message.get("reasoning_content", None)
    if (chunk.output.choices[0].message.content == [] and
        reasoning_content_chunk == ""):
        pass
    else:
        # 現在が思考プロセスの場合
        if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
            print(chunk.output.choices[0].message.reasoning_content, end="")
            reasoning_content += chunk.output.choices[0].message.reasoning_content
        # 現在が応答の場合
        elif chunk.output.choices[0].message.content != []:
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20)
                is_answering = True
            print(chunk.output.choices[0].message.content[0]["text"], end="")
            answer_content += chunk.output.choices[0].message.content[0]["text"]

# 完全な思考プロセスと応答を出力するには、以下のコードのコメントを解除して実行します
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")

Java

// DashScope SDK バージョン >= 2.21.10
import java.util.*;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    // 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/api/v1 に置き換えてください
    // 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください
    static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(MultiModalConversationResult message) {
        String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String reasoning = Objects.isNull(re)?"":re; // デフォルト値

        List<Map<String, Object>> content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (Objects.nonNull(content) && !content.isEmpty()) {
            Object text = content.get(0).get("text");
            finalContent.append(content.get(0).get("text"));
            if (!isFirstPrint) {
                System.out.println("\n====================Full response====================");
                isFirstPrint = true;
            }
            System.out.print(text);
        }
    }
    public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg)  {
        return MultiModalConversationParam.builder()
                // 環境変数を設定していない場合は、次の行を Model Studio API キーに置き換えます： .apiKey("sk-xxx")
                // API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .messages(Arrays.asList(Msg))
                .enableThinking(true)
                .incrementalOutput(true)
                .build();
    }

    public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
            throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
        MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(message -> {
            handleGenerationResult(message);
        });
    }
    public static void main(String[] args) {
        try {
            MultiModalConversation conv = new MultiModalConversation();
            MultiModalMessage userMsg = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(Collections.singletonMap("image", "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"),
                            Collections.singletonMap("text", "この問題を解いてください")))
                    .build();
            streamCallWithMessage(conv, userMsg);
//             最終結果を出力します
//            if (reasoningContent.length() > 0) {
//                System.out.println("\n====================Full response====================");
//                System.out.println(finalContent.toString());
//            }
        } catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
            logger.error("例外が発生しました： {}", e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= 重要 =======
# API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
# 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation に置き換えてください
# 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation に置き換えてください
# === 実行前にこのコメントを削除してください ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
                    {"text": "この問題を解いてください"}
                ]
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true
    }
}'

思考の長さの制限

モデルが過度に長い思考プロセスを生成するのを防ぐには、thinking_budget パラメーターを使用して、思考プロセスで生成される最大トークン数を制限します。思考プロセスがこの制限を超えると、コンテンツは切り捨てられ、モデルはすぐに最終的な回答の生成を開始します。thinking_budget のデフォルト値は、モデルの最大連鎖的思考の長さです。詳細については、「モデルリスト」をご参照ください。

重要

thinking_budget パラメーターは、Qwen3-VL (思考モード) および kimi-k2.5 (思考モード) でのみサポートされています。

OpenAI 互換

thinking_budget パラメーターは標準の OpenAI パラメーターではありません。OpenAI Python SDK を使用する場合、extra_body を介して渡す必要があります。

Python

import os
from openai import OpenAI

client = OpenAI(
    # API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/compatible-mode/v1 に置き換えてください
    # 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

reasoning_content = ""  # 完全な思考プロセスを定義します
answer_content = ""     # 完全な応答を定義します
is_answering = False   # 思考プロセスが終了し、応答が開始されたかどうかを確認します
enable_thinking = True
# チャット補完リクエストを作成します
completion = client.chat.completions.create(
    model="qwen3-vl-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
                    },
                },
                {"type": "text", "text": "この問題をどう解けばいいですか？"},
            ],
        },
    ],
    stream=True,
    # enable_thinking パラメーターは思考プロセスを有効にします。thinking_budget パラメーターは推論プロセスの最大トークン数を設定します。
    # qwen3-vl-plus および qwen3-vl-flash の場合、enable_thinking を使用して思考を有効または無効にできます。「thinking」サフィックスを持つモデル (qwen3-vl-235b-a22b-thinking など) の場合、enable_thinking は true にしか設定できません。このパラメーターは他の Qwen-VL モデルには適用されません。
    extra_body={
        'enable_thinking': enable_thinking,
        "thinking_budget": 81920},

    # 最後のチャンクでトークン使用量を返すには、以下のコメントを解除します
    # stream_options={
    #     "include_usage": True
    # }
)

if enable_thinking:
    print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")

for chunk in completion:
    # chunk.choices が空の場合、使用量を出力します
    if not chunk.choices:
        print("\nUsage:")
        print(chunk.usage)
    else:
        delta = chunk.choices[0].delta
        # 思考プロセスを出力します
        if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
            print(delta.reasoning_content, end='', flush=True)
            reasoning_content += delta.reasoning_content
        else:
            # 応答を開始します
            if delta.content != "" and is_answering is False:
                print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
                is_answering = True
            # 応答プロセスを出力します
            print(delta.content, end='', flush=True)
            answer_content += delta.content

# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(answer_content)

Node.js

import OpenAI from "openai";

// OpenAI クライアントを初期化します
const openai = new OpenAI({
  // API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
  // 環境変数を設定していない場合は、次の行を Model Studio API キーに置き換えます： apiKey: "sk-xxx"
  apiKey: process.env.DASHSCOPE_API_KEY,
  // 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/compatible-mode/v1 に置き換えてください
  // 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1 に置き換えてください
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});

let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let enableThinking = true;

let messages = [
    {
        role: "user",
        content: [
        { type: "image_url", image_url: { "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg" } },
        { type: "text", text: "この問題を解いてください" },
    ]
}]

async function main() {
    try {
        const stream = await openai.chat.completions.create({
            model: 'qwen3-vl-plus',
            messages: messages,
            stream: true,
          // 注：Node.js SDK では、enableThinking のような非標準パラメーターはトップレベルのプロパティとして渡され、extra_body に含める必要はありません。
          enable_thinking: enableThinking,
          thinking_budget: 81920

        });

        if (enableThinking){console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');}

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\nUsage:');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;

            // 思考プロセスを処理します
            if (delta.reasoning_content) {
                process.stdout.write(delta.reasoning_content);
                reasoningContent += delta.reasoning_content;
            }
            // 正式な応答を処理します
            else if (delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

curl

# ======= 重要 =======
# 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions に置き換えてください
# 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions に置き換えてください
# API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
# === 実行前にこのコメントを削除してください ===

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
          }
        },
        {
          "type": "text",
          "text": "この問題を解いてください"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true},
    "enable_thinking": true,
    "thinking_budget": 81920
}'

DashScope

Python

import os
import dashscope
from dashscope import MultiModalConversation

# 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/api/v1 に置き換えてください
# 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

enable_thinking = True

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
            {"text": "この問題をどう解けばいいですか？"}
        ]
    }
]

response = MultiModalConversation.call(
    # 環境変数を設定していない場合は、次の行を Model Studio API キーに置き換えます： api_key="sk-xxx",
    # API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model="qwen3-vl-plus",  
    messages=messages,
    stream=True,
    # enable_thinking パラメーターは思考プロセスを有効にします。
    # qwen3-vl-plus および qwen3-vl-flash の場合、enable_thinking を使用して思考を有効または無効にできます。「thinking」サフィックスを持つモデル (qwen3-vl-235b-a22b-thinking など) の場合、enable_thinking は true にしか設定できません。このパラメーターは他の Qwen-VL モデルには適用されません。
    enable_thinking=enable_thinking,
    # thinking_budget パラメーターは推論プロセスの最大トークン数を設定します。
    thinking_budget=81920,

)

# 完全な思考プロセスを定義します
reasoning_content = ""
# 完全な応答を定義します
answer_content = ""
# 思考プロセスが終了し、応答が開始されたかどうかを確認します
is_answering = False

if enable_thinking:
    print("=" * 20 + "Thinking process" + "=" * 20)

for chunk in response:
    # 思考プロセスと応答の両方が空の場合は無視します
    message = chunk.output.choices[0].message
    reasoning_content_chunk = message.get("reasoning_content", None)
    if (chunk.output.choices[0].message.content == [] and
        reasoning_content_chunk == ""):
        pass
    else:
        # 現在が思考プロセスの場合
        if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
            print(chunk.output.choices[0].message.reasoning_content, end="")
            reasoning_content += chunk.output.choices[0].message.reasoning_content
        # 現在が応答の場合
        elif chunk.output.choices[0].message.content != []:
            if not is_answering:
                print("\n" + "=" * 20 + "Full response" + "=" * 20)
                is_answering = True
            print(chunk.output.choices[0].message.content[0]["text"], end="")
            answer_content += chunk.output.choices[0].message.content[0]["text"]

# 完全な思考プロセスと応答を出力するには、以下のコードのコメントを解除して実行します
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")

Java

// DashScope SDK バージョン >= 2.21.10
import java.util.*;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    // 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/api/v1 に置き換えてください
    // 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1 に置き換えてください
    static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(MultiModalConversationResult message) {
        String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String reasoning = Objects.isNull(re)?"":re; // デフォルト値

        List<Map<String, Object>> content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (Objects.nonNull(content) && !content.isEmpty()) {
            Object text = content.get(0).get("text");
            finalContent.append(content.get(0).get("text"));
            if (!isFirstPrint) {
                System.out.println("\n====================Full response====================");
                isFirstPrint = true;
            }
            System.out.print(text);
        }
    }
    public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg)  {
        return MultiModalConversationParam.builder()
                // 環境変数を設定していない場合は、次の行を Model Studio API キーに置き換えます： .apiKey("sk-xxx")
                // API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .messages(Arrays.asList(Msg))
                .enableThinking(true)
                .thinkingBudget(81920)
                .incrementalOutput(true)
                .build();
    }

    public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
            throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
        MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(message -> {
            handleGenerationResult(message);
        });
    }
    public static void main(String[] args) {
        try {
            MultiModalConversation conv = new MultiModalConversation();
            MultiModalMessage userMsg = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(Collections.singletonMap("image", "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"),
                            Collections.singletonMap("text", "この問題を解いてください")))
                    .build();
            streamCallWithMessage(conv, userMsg);
//             最終結果を出力します
//            if (reasoningContent.length() > 0) {
//                System.out.println("\n====================Full response====================");
//                System.out.println(finalContent.toString());
//            }
        } catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
            logger.error("例外が発生しました： {}", e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= 重要 =======
# API キーはリージョンごとに異なります。API キーを取得するには、https://www.alibabacloud.com/help/ja/model-studio/get-api-key をご参照ください
# 以下はシンガポールリージョンのベース URL です。米国 (バージニア) リージョンのモデルを使用している場合は、base_url を https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation に置き換えてください
# 北京リージョンのモデルを使用している場合は、base_url を https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation に置き換えてください
# === 実行前にこのコメントを削除してください ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
                    {"text": "この問題を解いてください"}
                ]
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "thinking_budget": 81920
    }
}'

その他の例

視覚的推論モデルは、その推論機能に加えて、視覚的理解モデルのすべての特徴を備えています。これらの特徴を組み合わせて、より複雑なシナリオに対応できます：

課金

合計コスト = (入力トークン × 入力トークンあたりの料金) + (出力トークン × 出力トークンあたりの料金)。

思考プロセス (reasoning_content) は出力コンテンツの一部であり、出力トークンとして課金されます。思考モードのモデルが思考プロセスを出力しない場合、非思考モード料金で課金されます。
イメージまたはビデオのトークン計算方法については、「視覚理解」をご参照ください。

API リファレンス

入力パラメーターと出力パラメーターについては、「Qwen」をご参照ください。

エラーコード

呼び出しが失敗した場合は、トラブルシューティングのために「エラーメッセージ」をご参照ください。