Token Plan (Team Edition) によるマルチモーダル生成モデルの統合 - Alibaba Cloud Model Studio

イメージ生成モデルは、各ツールの拡張メカニズム（Skill、Slash Command、または Agent）を通じて統合する必要があります。

概要

AI コーディングツールは、モデル構成を通じて直接イメージ生成モデルを呼び出すことはできません。各ツールの拡張メカニズムを介して統合する必要があります。

例：Claude Code

以下の例では、Slash Command を使用して Claude Code にイメージ生成モデルを統合する方法を示します。他のツールでも統合プロセスは同様ですが、拡張メカニズムと設定ファイルのパスが異なります。

ステップ 1：Slash Command の作成

プロジェクトのルートディレクトリにファイル .claude/commands/text-to-image.md を作成し、以下の内容を記述します。

Call the Token Plan text-to-image API to generate an image based on a description.

User request: $ARGUMENTS

## Steps

1. Extract prompt (image description), model (default: qwen-image-2.0), and size (default: 1024*1024) from the user request.

2. Call the API to generate an image (use the Bash tool to run curl):

```
curl -s -X POST "https://token-plan.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
  -H "Authorization: Bearer $ANTHROPIC_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model>",
    "input": {
      "messages": [{"role":"user","content":[{"text":"<prompt>"}]}]
    },
    "parameters": {"size":"<size>"}
  }'
```

3. Extract the image URL from output.choices[*].message.content[*].image in the response JSON.

4. Download the image to the current directory with curl -s -o "generated_$(date +%Y%m%d_%H%M%S).png" "<URL>".

5. Display the generated image file path to the user.

ステップ 2：イメージの生成

Claude Code で、/text-to-image draw a cat と入力します。

その他のツール

以下の表は、各ツールにおける拡張メカニズムと設定ファイルのパスを示しています。Claude Code の例で示したのと同じ内容を、対応するパスに保存してください。

ツール	拡張メカニズム	設定ファイルのパス
Claude Code	Slash Command	`.claude/commands/text-to-image.md`
Codex	Skill	`~/.codex/skills/token-plan-image/SKILL.md`
Qwen Code	Skill	`~/.qwen/skills/text-to-image/SKILL.md`
OpenCode	Agent	`.opencode/agents/text-to-image.md`
OpenClaw	Skill	`~/.openclaw/workspace/skills/token-plan-image/SKILL.md`
Hermes Agent	Skill	`~/.hermes/skills/media/text-to-image/SKILL.md`

説明

Skill ベースのツール（Codex、Qwen Code、OpenClaw、Hermes Agent）では、設定ファイルの先頭に YAML front matter を記述する必要があります。

---
name: "token-plan-image"
description: "Call the Token Plan text-to-image model to generate images from text descriptions. Activates when the user asks to draw or generate images."
---

(... same content as the Claude Code example above ...)

OpenCode Agent では、front matter のフォーマットが異なります。

---
description: "Call the Token Plan text-to-image model to generate images from text descriptions."
mode: subagent
tools:
  bash: true
  write: false
  edit: false
---

(... same content as the Claude Code example above ...)