Add visual understanding capabilities - Alibaba Cloud Model Studio

Prerequisites

You have subscribed to Coding Plan. See Getting started.
You have set up Coding Plan and can use it normally. See Set up AI tools.

Visual support status

Model	Visual support	Description
qwen3.5-plus kimi-k2.5	Yes	No configuration needed -- pass images directly.
qwen3-max-2026-01-23 qwen3-coder-next qwen3-coder-plus glm-5 glm-4.7 MiniMax-M2.5	No	A skill or agent is required for visual capabilities.

Method 1: Use a visual model directly (recommended)

qwen3.5-plus and kimi-k2.5 support image understanding natively. Switch to these models for frequent image work.

Tool	How to switch
Claude Code	`/model qwen3.5-plus` or `/model kimi-k2.5`
OpenCode	`/models` → Search for and select `qwen3.5-plus` or `kimi-k2.5`
Qwen Code	`/model` → Select `qwen3.5-plus` or `kimi-k2.5`

For other tools, see Set up AI tools. After switching, reference image paths or drag images into conversations.

Method 2: Add visual capabilities using a skill or agent

Text-only models (glm-5, MiniMax-M2.5) require a skill or agent for visual capabilities.

Claude Code

Add a skill

Create a skills/image-analyzer folder in the .claude directory:

mkdir -p .claude/skills/image-analyzer

Create a SKILL.md file:

---
name: image-analyzer
description: Adds visual understanding to text-only models. Analyzes images like screenshots, charts, and diagrams. Pass the image path to get a description.
model: qwen3.5-plus
---
qwen3.5-plus has visual understanding capabilities. Use it directly for image understanding.

Folder structure:

.claude/
└── skills/
    └── image-analyzer/
        └── SKILL.md

Get started
1. Start Claude Code in your project directory, then switch to glm-5 with /model glm-5.
2. Download alibabacloud.png to your project directory. Then ask: Load the image-analyzer skill and describe the information displayed in the alibabacloud.png banner. Response:

OpenCode

Add an agent

Create an agents folder in the .opencode directory:

mkdir -p .opencode/agents

Create an image-analyzer.md file:

Note

The model field must use the provider and model name from your OpenCode configuration. For example, based on the OpenCode setup, use bailian-coding-plan/qwen3.5-plus.

---
description: Analyzes images using a vision-capable model. Invoke with @image-analyzer followed by the image path and your question.
mode: subagent
model: bailian-coding-plan/qwen3.5-plus
tools:
  write: false
  edit: false
---
You have vision capabilities. Analyze the image and return a clear description focused on the user's question.

Folder structure:

.opencode/
└── agents/
    └── image-analyzer.md

Get Started
1. Start OpenCode in your project directory, then switch to glm-5.
2. Download alibabacloud.png to the project folder. Use @ to invoke image-analyzer, then ask: @image-analyzer describe the information displayed in the alibabacloud.png banner.

FAQ

Why can't OpenCode + qwen3.5-plus understand images?

Cause: OpenCode doesn't enable visual capabilities by default. You must declare the modalities parameter in the configuration.

Solution: In the OpenCode configuration, add modalities and set input to ["text", "image"]:

Replace sk-sp-xxx with your API key.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "bailian-coding-plan-test": {
      "npm": "@ai-sdk/anthropic",
      "name": "Model Studio Coding Plan",
      "options": {
        "baseURL": "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1",
        "apiKey": "sk-sp-xxx"
      },
      "models": {
        "qwen3.5-plus": {
          "name": "Qwen3.5 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "kimi-k2.5": {
          "name": "Kimi K2.5",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        }
      }
    }
  }
}

Why can't OpenClaw + qwen3.5-plus understand images?

Cause: OpenClaw determines visual support based on the input field in the configuration.

Solution:

In the ~/.openclaw/openclaw.json configuration, ensure the model definition includes "input": ["text", "image"].

{
  "models": {
    "mode": "merge",
    "providers": {
      "bailian": {
        "baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
        "apiKey": "YOUR_API_KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3.5-plus",
            "name": "qwen3.5-plus",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 1000000,
            "maxTokens": 65536
          },
          {
            "id": "kimi-k2.5",
            "name": "kimi-k2.5",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 262144,
            "maxTokens": 32768
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "bailian/qwen3.5-plus"
      },
      "models": {
        "bailian/qwen3.5-plus": {},
        "bailian/kimi-k2.5": {}
      }
    }
  },
  "gateway": {
    "mode": "local"
  }
}

After modifying the configuration, clear the OpenClaw model cache and restart:
```
rm ~/.openclaw/agents/main/agent/models.json
openclaw gateway restart
```