All Products
Search
Document Center

Alibaba Cloud Model Studio:Add visual understanding capabilities

Last Updated:Mar 14, 2026

Models like qwen3.5-plus and kimi-k2.5 support image understanding natively. For text-only models like glm-5 and MiniMax-M2.5, add a local skill for visual capabilities.

Note

Image understanding skills consume Coding Plan quota. No additional charges apply.

Prerequisites

  1. You have subscribed to Coding Plan. See Getting started.

  2. You have set up Coding Plan and can use it normally. See Set up AI tools.

Visual support status

Model

Visual support

Description

  • qwen3.5-plus

  • kimi-k2.5

Yes

No configuration needed -- pass images directly.

  • qwen3-max-2026-01-23

  • qwen3-coder-next

  • qwen3-coder-plus

  • glm-5

  • glm-4.7

  • MiniMax-M2.5

No

A skill or agent is required for visual capabilities.

Method 1: Use a visual model directly (recommended)

qwen3.5-plus and kimi-k2.5 support image understanding natively. Switch to these models for frequent image work.

Tool

How to switch

Claude Code

/model qwen3.5-plus or /model kimi-k2.5

OpenCode

/models → Search for and select qwen3.5-plus or kimi-k2.5

Qwen Code

/model → Select qwen3.5-plus or kimi-k2.5

For other tools, see Set up AI tools. After switching, reference image paths or drag images into conversations.

Method 2: Add visual capabilities using a skill or agent

Text-only models (glm-5, MiniMax-M2.5) require a skill or agent for visual capabilities.

Claude Code

  1. Add a skill

    Create a skills/image-analyzer folder in the .claude directory:

    mkdir -p .claude/skills/image-analyzer

    Create a SKILL.md file:

    ---
    name: image-analyzer
    description: Adds visual understanding to text-only models. Analyzes images like screenshots, charts, and diagrams. Pass the image path to get a description.
    model: qwen3.5-plus
    ---
    qwen3.5-plus has visual understanding capabilities. Use it directly for image understanding.

    Folder structure:

    .claude/
    └── skills/
        └── image-analyzer/
            └── SKILL.md
  2. Get started

    1. Start Claude Code in your project directory, then switch to glm-5 with /model glm-5.

    2. Download alibabacloud.png to your project directory. Then ask: Load the image-analyzer skill and describe the information displayed in the alibabacloud.png banner. Response:

      image.png

OpenCode

  1. Add an agent

    Create an agents folder in the .opencode directory:

    mkdir -p .opencode/agents

    Create an image-analyzer.md file:

    Note

    The model field must use the provider and model name from your OpenCode configuration. For example, based on the OpenCode setup, use bailian-coding-plan/qwen3.5-plus.

    ---
    description: Analyzes images using a vision-capable model. Invoke with @image-analyzer followed by the image path and your question.
    mode: subagent
    model: bailian-coding-plan/qwen3.5-plus
    tools:
      write: false
      edit: false
    ---
    You have vision capabilities. Analyze the image and return a clear description focused on the user's question.

    Folder structure:

    .opencode/
    └── agents/
        └── image-analyzer.md
  2. Get Started

    1. Start OpenCode in your project directory, then switch to glm-5.

    2. Download alibabacloud.png to the project folder. Use @ to invoke image-analyzer, then ask: @image-analyzer describe the information displayed in the alibabacloud.png banner.

      image

FAQ

Why can't OpenCode + qwen3.5-plus understand images?

Cause: OpenCode doesn't enable visual capabilities by default. You must declare the modalities parameter in the configuration.

Solution: In the OpenCode configuration, add modalities and set input to ["text", "image"]:

Replace sk-sp-xxx with your API key.
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "bailian-coding-plan-test": {
      "npm": "@ai-sdk/anthropic",
      "name": "Model Studio Coding Plan",
      "options": {
        "baseURL": "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1",
        "apiKey": "sk-sp-xxx"
      },
      "models": {
        "qwen3.5-plus": {
          "name": "Qwen3.5 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "kimi-k2.5": {
          "name": "Kimi K2.5",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        }
      }
    }
  }
}

Why can't OpenClaw + qwen3.5-plus understand images?

Cause: OpenClaw determines visual support based on the input field in the configuration.

Solution:

  1. In the ~/.openclaw/openclaw.json configuration, ensure the model definition includes "input": ["text", "image"].

    {
      "models": {
        "mode": "merge",
        "providers": {
          "bailian": {
            "baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
            "apiKey": "YOUR_API_KEY",
            "api": "openai-completions",
            "models": [
              {
                "id": "qwen3.5-plus",
                "name": "qwen3.5-plus",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
                "contextWindow": 1000000,
                "maxTokens": 65536
              },
              {
                "id": "kimi-k2.5",
                "name": "kimi-k2.5",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
                "contextWindow": 262144,
                "maxTokens": 32768
              }
            ]
          }
        }
      },
      "agents": {
        "defaults": {
          "model": {
            "primary": "bailian/qwen3.5-plus"
          },
          "models": {
            "bailian/qwen3.5-plus": {},
            "bailian/kimi-k2.5": {}
          }
        }
      },
      "gateway": {
        "mode": "local"
      }
    }
  2. After modifying the configuration, clear the OpenClaw model cache and restart:

    rm ~/.openclaw/agents/main/agent/models.json
    openclaw gateway restart