All Products
Search
Document Center

Alibaba Cloud Model Studio:Add visual understanding capabilities

Last Updated:Mar 05, 2026

Some Coding Plan models, such as qwen3.5-plus and kimi-k2.5, support image understanding natively. But for text-only models like glm-5 and MiniMax-M2.5, add a local skill for visual capabilities.

Note

Image understanding skills consume Coding Plan quota. No additional charges apply.

Prerequisites

  1. You have subscribed to Coding Plan. See Getting started.

  2. You have set up Coding Plan and can use it normally. See Set up AI tools.

Visual support status

Model

Visual support

Description

  • qwen3.5-plus

  • kimi-k2.5

Yes

No configuration needed. Pass images directly.

  • qwen3-max-2026-01-23

  • qwen3-coder-next

  • qwen3-coder-plus

  • glm-5

  • glm-4.7

  • MiniMax-M2.5

No

Requires a skill or agent for visual capabilities.

Method 1: Use a visual model directly (recommended)

qwen3.5-plus and kimi-k2.5 support image understanding natively. If you frequently work with images, switch to these models.

Tool

How to switch

Claude Code

/model qwen3.5-plus or /model kimi-k2.5

OpenCode

/models → Search for and select qwen3.5-plus or kimi-k2.5

Qwen Code

/model → Select qwen3.5-plus or kimi-k2.5

For other tools, see Set up AI tools. After switching, reference image paths directly or drag and drop images in conversations.

Method 2: Add visual capabilities using a skill or agent

For models without visual capabilities (glm-5, MiniMax-M2.5), configure a skill or agent.

Claude Code

  1. Add a skill

    Create a skills/image-analyzer folder in the .claude directory:

    mkdir -p .claude/skills/image-analyzer

    Create a SKILL.md file with the following content:

    ---
    name: image-analyzer
    description: Helps models without visual capabilities understand images. Use this skill when you need to analyze image content, extract information, text, or UI elements from an image, or understand any visual content such as screenshots, charts, or architecture diagrams. Pass the image path to get a description.
    model: qwen3.5-plus
    ---
    qwen3.5-plus has visual understanding capabilities. Use the qwen3.5-plus model directly for image understanding.

    Folder structure:

    .claude/
    └── skills/
        └── image-analyzer/
            └── SKILL.md
  2. Get started

    1. Start Claude Code in your project directory, then run claude. Switch to glm-5 with /model glm-5.

    2. Download and alibabacloud.png to your project directory. Then ask: Load the image-analyzer skill and describe the information displayed in the alibabacloud.png banner. Response:

      image.png

OpenCode

  1. Add an agent

    Create an agents folder in the .opencode directory:

    mkdir -p .opencode/agents

    Create an image-analyzer.md file with the following content:

    Note

    The model field must use the provider and model name from the OpenCode configuration. For example, based on the OpenCode setup, use bailian-coding-plan/qwen3.5-plus.

    ---
    description: Analyzes images using a vision-capable model. Use this agent when the user needs to understand image content, extract information from screenshots, diagrams, UI mockups, or any visual content. Invoke with @image-analyzer followed by the image path and your question.
    mode: subagent
    model: bailian-coding-plan/qwen3.5-plus
    tools:
      write: false
      edit: false
    ---
    You have vision capabilities. Analyze the provided image and return a clear, structured description focused on what the user is asking about.

    Folder structure:

    .opencode/
    └── agents/
        └── image-analyzer.md
  2. Get Started

    1. Start OpenCode in your project directory, then switch to glm-5.

    2. Download alibabacloud.png to the project folder. Use @ to invoke image-analyzer, then ask: @image-analyzer describe the information displayed in the alibabacloud.png banner.

      image

FAQ

Why can't OpenCode + qwen3.5-plus understand images?

Cause: OpenCode doesn't enable visual capabilities by default. You must declare the modalities parameter in the configuration.

Solution: In the OpenCode configuration, add modalities and set input to ["text", "image"]:

Replace sk-sp-xxx with your API key.
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "bailian-coding-plan-test": {
      "npm": "@ai-sdk/anthropic",
      "name": "Model Studio Coding Plan",
      "options": {
        "baseURL": "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1",
        "apiKey": "sk-sp-xxx"
      },
      "models": {
        "qwen3.5-plus": {
          "name": "Qwen3.5 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "kimi-k2.5": {
          "name": "Kimi K2.5",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        }
      }
    }
  }
}

Why can't OpenClaw + qwen3.5-plus understand images?

Cause: OpenClaw determines visual support based on the input field in the configuration.

Solution:

  1. In the ~/.openclaw/openclaw.json configuration, ensure the model definition includes "input": ["text", "image"].

    {
      "models": {
        "mode": "merge",
        "providers": {
          "bailian": {
            "baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
            "apiKey": "YOUR_API_KEY",
            "api": "openai-completions",
            "models": [
              {
                "id": "qwen3.5-plus",
                "name": "qwen3.5-plus",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
                "contextWindow": 1000000,
                "maxTokens": 65536
              },
              {
                "id": "kimi-k2.5",
                "name": "kimi-k2.5",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
                "contextWindow": 262144,
                "maxTokens": 32768
              }
            ]
          }
        }
      },
      "agents": {
        "defaults": {
          "model": {
            "primary": "bailian/qwen3.5-plus"
          },
          "models": {
            "bailian/qwen3.5-plus": {},
            "bailian/kimi-k2.5": {}
          }
        }
      },
      "gateway": {
        "mode": "local"
      }
    }
  2. After modifying the configuration, clear the OpenClaw model cache and restart:

    rm ~/.openclaw/agents/main/agent/models.json
    openclaw gateway restart