All Products
Search
Document Center

Alibaba Cloud Model Studio:Kimi

Last Updated:Feb 10, 2026

This topic describes how to call Kimi series models on the Alibaba Cloud Model Studio platform using either the OpenAI-compatible API or the DashScope SDK.

Important

This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.

Model introduction

The Kimi series models are large language models (LLMs) developed by Moonshot AI.

  • kimi-k2.5: Kimi's most intelligent model to date, achieving open-source SoTA performance in Agent, code, visual understanding, and a range of general intelligent tasks. It is also Kimi's most versatile model to date, featuring a native multimodal architecture that supports both visual and text input, thinking and non-thinking modes, and dialogue and Agent tasks.

  • kimi-k2-thinking: Supports only deep thinking mode. It displays the reasoning process in the reasoning_content field. This model excels at coding and tool calling. It is suitable for scenarios that require logical analysis, planning, or deep understanding.

  • Moonshot-Kimi-K2-Instruct: Does not support deep thinking. It generates responses directly for faster performance. This model is suitable for scenarios that require quick and direct answers.

Model

Mode

Context window

Max input

Max CoT

Max response

Input cost

Output cost

(tokens)

(per 1M tokens)

kimi-k2.5

Thinking

262,144

258,048

32,768

32,768

$0.574

$3.011

kimi-k2.5

Non-thinking

262,144

260,096

-

32,768

$0.574

$3.011

kimi-k2-thinking

Thinking

262,144

229,376

32,768

16,384

$0.574

$2.294

Moonshot-Kimi-K2-Instruct

Non-thinking

131,072

131,072

-

8,192

$0.574

$2.294

The above models are not third-party services. They are all deployed on Model Studio servers.

Text generation example

Before you use the API, you must get an API key and set the API key as environment variable. If you make calls using an SDK, install the SDK.

OpenAI compatible

Python

Sample code

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2-thinking",
    messages=[{"role": "user", "content": "Who are you?"}],
    stream=True,
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Indicates whether the model has started generating the response

print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    if chunk.choices:
        delta = chunk.choices[0].delta
        # Collect only the thinking content
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
        # When content is received, start generating the response
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content

Response

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.

I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences.

Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================

I am an AI assistant named Kimi, developed by Moonshot AI. I am based on a Mixture-of-Experts (MoE) architecture and have capabilities such as long-context understanding, intelligent conversation, file processing, code generation, and complex task reasoning. How can I help you?

Node.js

Sample code

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    // If the environment variable is not set, replace this with your Model Studio API key: apiKey: "sk-xxx"
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = ''; // Complete thinking process
let answerContent = ''; // Complete response
let isAnswering = false; // Indicates whether the model has started generating the response

async function main() {
    const messages = [{ role: 'user', content: 'Who are you?' }];

    const stream = await openai.chat.completions.create({
        model: 'kimi-k2-thinking',
        messages,
        stream: true,
    });

    console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');

    for await (const chunk of stream) {
        if (chunk.choices?.length) {
            const delta = chunk.choices[0].delta;
            // Collect only the thinking content
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // When content is received, start generating the response
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    }
}

main();

Response

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.

I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can easily understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences to avoid misunderstandings.

Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================

I am an AI assistant named Kimi, developed by Moonshot AI.

I am good at:
- Long-text understanding and generation
- Intelligent conversation and Q&A
- File processing and analysis
- Information retrieval and integration

As an AI assistant, I do not have personal consciousness, emotions, or experiences, but I will do my best to provide you with accurate and helpful assistance. How can I help you?

HTTP

Sample code

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2-thinking",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ]
}'

Response

{
    "choices": [
        {
            "message": {
                "content": "I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, file analysis, programming assistance, and complex task reasoning. I can help you answer questions, create content, and analyze documents. How can I help you?",
                "reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:\n1. My identity: AI assistant\n2. My developer: Moonshot AI\n3. My name: Kimi\n4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.\n\nI should maintain a friendly and professional tone while providing useful information. No need to overcomplicate it; a direct answer is sufficient.",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 8,
        "completion_tokens": 183,
        "total_tokens": 191
    },
    "created": 1762753998,
    "system_fingerprint": null,
    "model": "kimi-k2-thinking",
    "id": "chatcmpl-485ab490-90ec-48c3-85fa-1c732b683db2"
}

DashScope

Python

Sample code

import os
from dashscope import Generation

# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # If the environment variable is not set, replace this with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2-thinking",
    messages=messages,
    result_format="message",  # Set the result format to message
    stream=True,              # Enable streaming output
    incremental_output=True,  # Enable incremental output
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Indicates whether the model has started generating the response

print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    message = chunk.output.choices[0].message
    
    # Collect only the thinking content
    if message.reasoning_content:
        if not is_answering:
            print(message.reasoning_content, end="", flush=True)
        reasoning_content += message.reasoning_content

    # When content is received, start generating the response
    if message.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
            is_answering = True
        print(message.content, end="", flush=True)
        answer_content += message.content

# After the loop, the reasoning_content and answer_content variables contain the complete content.
# You can perform further processing here as needed.
# print(f"\n\nComplete thinking process:\n{reasoning_content}")
# print(f"\nComplete response:\n{answer_content}")

Response

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.

Key information to include the following:
1. My name: Kimi
2. My developer: Moonshot AI
3. My nature: Artificial intelligence assistant
4. What I can do: Answer questions, assist with creation, etc.

I should maintain a friendly and helpful tone while accurately stating my identity. I should not pretend to be human or have a personal identity.

A suitable response could be:
"I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?"

This response is direct, accurate, and invites further interaction.
====================Complete Response====================

I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?

Java

Sample code

// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();

        if (reasoning!= null&&!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking Process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (content!= null&&!content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Complete Response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2-thinking")
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }

    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
            // Print the final result
            // if (reasoningContent.length() > 0) {
            //     System.out.println("\n====================Complete Response====================");
            //     System.out.println(finalContent.toString());
            // }
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

Response

====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.

The response should include the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, etc.

I should not pretend to be human or provide excessive technical details. A clear and friendly answer is sufficient.
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, answering questions, assisting with creation, and helping you analyze and process files. How can I help you?

HTTP

Sample code

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2-thinking",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters": {
        "result_format": "message"
    }
}'

Response

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "content": "I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you answer questions, create content, analyze documents, and write code. How can I help you?",
                    "reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.\n\nKey information to include the following:\n1. My name: Kimi\n2. My developer: Moonshot AI\n3. My nature: Artificial intelligence assistant\n4. What I can do: Answer questions, assist with creation, etc.\n\nI should respond in a friendly and direct manner that is easy for the user to understand.",
                    "role": "assistant"
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 9,
        "output_tokens": 156,
        "total_tokens": 165
    },
    "request_id": "709a0697-ed1f-4298-82c9-a4b878da1849"
}

kimi-k2.5 multimodal example

kimi-k2.5 can process text, images, or video inputs simultaneously.

Enable or disable thinking mode

kimi-k2.5 is a hybrid thinking model that can respond either after a thinking process or directly. Control this behavior using the enable_thinking parameter:

  • true

  • false (default)

The following examples show how to use an image URL and enable thinking mode. The main example uses a single image, while the commented-out code demonstrates multi-image input.

OpenAI compatible

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

# Example of passing a single image (thinking mode enabled)
completion = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What scene is depicted in the image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                }
            ]
        }
    ],
    extra_body={"enable_thinking":True}  # Enable thinking mode
)

# Print the thinking process
if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    print(completion.choices[0].message.reasoning_content)

# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(completion.choices[0].message.content)

# Example of passing multiple images (thinking mode enabled, uncomment to use)
# completion = client.chat.completions.create(
#     model="kimi-k2.5",
#     messages=[
#         {
#             "role": "user",
#             "content": [
#                 {"type": "text", "text": "What do these images depict?"},
#                 {
#                     "type": "image_url",
#                     "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
#                 }
#             ]
#         }
#     ],
#     extra_body={"enable_thinking":True}
# )
#
# # Print the thinking process and response
# if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
#     print("\nThinking Process:\n" + completion.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + completion.choices[0].message.content)

Node.js

import OpenAI from "openai";
import process from 'process';

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

// Example of passing a single image (thinking mode enabled)
const completion = await openai.chat.completions.create({
    model: 'kimi-k2.5',
    messages: [
        {
            role: 'user',
            content: [
                { type: 'text', text: 'What scene is depicted in the image?' },
                {
                    type: 'image_url',
                    image_url: {
                        url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg'
                    }
                }
            ]
        }
    ],
    enable_thinking: true  // Enable thinking mode
});

// Print the thinking process
if (completion.choices[0].message.reasoning_content) {
    console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
    console.log(completion.choices[0].message.reasoning_content);
}

// Print the response content
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.content);

// Example of passing multiple images (thinking mode enabled, uncomment to use)
// const multiCompletion = await openai.chat.completions.create({
//     model: 'kimi-k2.5',
//     messages: [
//         {
//             role: 'user',
//             content: [
//                 { type: 'text', text: 'What do these images depict?' },
//                 {
//                     type: 'image_url',
//                     image_url: { url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg' }
//                 },
//                 {
//                     type: 'image_url',
//                     image_url: { url: 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png' }
//                 }
//             ]
//         }
//     ],
//     enable_thinking: true
// });
//
// // Print the thinking process and response
// if (multiCompletion.choices[0].message.reasoning_content) {
//     console.log('\nThinking Process:\n' + multiCompletion.choices[0].message.reasoning_content);
// }
// console.log('\nComplete Response:\n' + multiCompletion.choices[0].message.content);

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.5",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What scene is depicted in the image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                }
            ]
        }
    ],
    "enable_thinking": true
}'

# Multi-image input example (uncomment to use)
# curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
#     "model": "kimi-k2.5",
#     "messages": [
#         {
#             "role": "user",
#             "content": [
#                 {
#                     "type": "text",
#                     "text": "What do these images depict?"
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {
#                         "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
#                     }
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {
#                         "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
#                     }
#                 }
#             ]
#         }
#     ],
#     "enable_thinking": true,
#     "stream": false
# }'

DashScope

Python

import os
from dashscope import MultiModalConversation

# Example of passing a single image (thinking mode enabled)
response = MultiModalConversation.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"text": "What scene is depicted in the image?"},
                {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
            ]
        }
    ],
    enable_thinking=True  # Enable thinking mode
)

# Print the thinking process
if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    print(response.output.choices[0].message.reasoning_content)

# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(response.output.choices[0].message.content[0]["text"])

# Example of passing multiple images (thinking mode enabled, uncomment to use)
# response = MultiModalConversation.call(
#     api_key=os.getenv("DASHSCOPE_API_KEY"),
#     model="kimi-k2.5",
#     messages=[
#         {
#             "role": "user",
#             "content": [
#                 {"text": "What do these images depict?"},
#                 {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
#                 {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
#             ]
#         }
#     ],
#     enable_thinking=True
# )
#
# # Print the thinking process and response
# if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
#     print("\nThinking Process:\n" + response.output.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + response.output.choices[0].message.content[0]["text"])

Java

// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

public class KimiK25MultiModalExample {
    public static void main(String[] args) {
        try {
            // Single-image input example (thinking mode enabled)
            MultiModalConversation conv = new MultiModalConversation();

            // Build the message content
            Map<String, Object> textContent = new HashMap<>();
            textContent.put("text", "What scene is depicted in the image?");

            Map<String, Object> imageContent = new HashMap<>();
            imageContent.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");

            MultiModalMessage userMessage = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(textContent, imageContent))
                    .build();

            // Build the request parameters
            MultiModalConversationParam param = MultiModalConversationParam.builder()
                    // If the environment variable is not set, replace this with your Model Studio API key
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("kimi-k2.5")
                    .messages(Arrays.asList(userMessage))
                    .enableThinking(true)  // Enable thinking mode
                    .build();

            // Call the model
            MultiModalConversationResult result = conv.call(param);

            // Print the result
            String content = result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text");
            System.out.println("Response content: " + content);

            // If thinking mode is enabled, print the thinking process
            if (result.getOutput().getChoices().get(0).getMessage().getReasoningContent() != null) {
                System.out.println("\nThinking process: " +
                    result.getOutput().getChoices().get(0).getMessage().getReasoningContent());
            }

            // Multi-image input example (uncomment to use)
            // Map<String, Object> imageContent1 = new HashMap<>();
            // imageContent1.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");
            // Map<String, Object> imageContent2 = new HashMap<>();
            // imageContent2.put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png");
            //
            // Map<String, Object> textContent2 = new HashMap<>();
            // textContent2.put("text", "What do these images depict?");
            //
            // MultiModalMessage multiImageMessage = MultiModalMessage.builder()
            //         .role(Role.USER.getValue())
            //         .content(Arrays.asList(textContent2, imageContent1, imageContent2))
            //         .build();
            //
            // MultiModalConversationParam multiParam = MultiModalConversationParam.builder()
            //         .apiKey(System.getenv("DASHSCOPE_API_KEY"))
            //         .model("kimi-k2.5")
            //         .messages(Arrays.asList(multiImageMessage))
            //         .enableThinking(true)
            //         .build();
            //
            // MultiModalConversationResult multiResult = conv.call(multiParam);
            // System.out.println(multiResult.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));

        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("Call failed: " + e.getMessage());
        }
    }
}

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.5",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "What scene is depicted in the image?"
                    },
                    {
                        "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                ]
            }
        ]
    },
    "parameters": {
        "enable_thinking": true
    }
}'

# Multi-image input example (uncomment to use)
# curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
#     "model": "kimi-k2.5",
#     "input": {
#         "messages": [
#             {
#                 "role": "user",
#                 "content": [
#                     {
#                         "text": "What do these images depict?"
#                     },
#                     {
#                         "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
#                     },
#                     {
#                         "image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
#                     }
#                 ]
#             }
#         ]
#     },
#     "parameters": {
#         "enable_thinking": true
#     }
# }'

Video understanding

Video file

kimi-k2.5 analyzes video content by extracting frames. Control the frame extraction strategy using the following two parameters:

  • fps: Controls the frame extraction frequency. One frame is extracted every seconds. Valid values: 0.1 to 10. Default value: 2.0.

    • For fast-moving scenes, use a higher fps value to capture more detail.

    • For static or long videos, use a lower fps value to improve processing efficiency.

  • max_frames: Limits the maximum number of frames extracted from a video. Default value and maximum value: 2000.

    If the total number of frames calculated by fps exceeds this limit, the system automatically extracts frames uniformly within the max_frames limit. This parameter is available only when using the DashScope SDK.

OpenAI compatible

When using the OpenAI SDK or HTTP method to pass a video file directly to the model, set the "type" parameter in the user message to "video_url".

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                # When passing a video file directly, set the "type" value to "video_url"
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                    "fps": 2
                },
                {
                    "type": "text",
                    "text": "What is the content of this video?"
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "kimi-k2.5",
        messages: [
            {
                role: "user",
                content: [
                    // When passing a video file directly, set the "type" value to "video_url"
                    {
                        type: "video_url",
                        video_url: {
                            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                        },
                        "fps": 2
                    },
                    {
                        type: "text",
                        text: "What is the content of this video?"
                    }
                ]
            }
        ]
    });

    console.log(response.choices[0].message.content);
}

main();

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "video_url",
            "video_url": {
              "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
            },
            "fps":2
          },
          {
            "type": "text",
            "text": "What is the content of this video?"
          }
        ]
      }
    ]
  }'

DashScope

Python

import dashscope
import os

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

messages = [
    {"role": "user",
        "content": [
            # The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
            {"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "What is the content of this video?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # If the environment variable is not set, replace the following line with your Model Studio API key: api_key ="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='kimi-k2.5',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
        Map<String, Object> params = new HashMap<>();
        params.put("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4");
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "What is the content of this video?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.5")
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "kimi-k2.5",
    "input":{
        "messages":[
            {"role": "user","content": [{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "What is the content of this video?"}]}]}
}'

Image list

When a video is passed as an image list (pre-extracted video frames), use the fps parameter to inform the model of the time interval between frames. This helps the model better understand event order, duration, and dynamic changes. The model supports specifying the original video's frame extraction rate using the fps parameter, meaning frames are extracted from the original video every seconds.

OpenAI compatible

When using the OpenAI SDK or HTTP method to pass a video as an image list, set the "type" parameter in the user message to "video".

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2.5", 
    messages=[{"role": "user","content": [
        # When passing an image list, set the "type" parameter in the user message to "video"
         {"type": "video","video": [
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
         "fps":2},
         {"type": "text","text": "Describe the specific process of this video"},
    ]}]
)

print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "kimi-k2.5",  
        messages: [{
            role: "user",
            content: [
                {
                    // When passing an image list, set the "type" parameter in the user message to "video"
                    type: "video",
                    video: [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                        "fps":2
                },
                {
                    type: "text",
                    text: "Describe the specific process of this video"
                }
            ]
        }]
    });
    console.log(response.choices[0].message.content);
}

main();

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "kimi-k2.5",
    "messages": [{"role": "user","content": [{"type": "video","video": [
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                  "fps":2},
                {"type": "text","text": "Describe the specific process of this video"}]}]
}'

DashScope

Python

import os
import dashscope

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

messages = [{"role": "user",
             "content": [
                 {"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                   "fps":2},
                 {"text": "Describe the specific process of this video"}]}]
response = dashscope.MultiModalConversation.call(
    # If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='kimi-k2.5', 
    messages=messages
)
print(response.output.choices[0].message.content[0]["text"])

Java

// DashScope SDK version must be at least 2.21.10
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}

    private static final String MODEL_NAME = "kimi-k2.5"; 
    public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> params = new HashMap<>();
        params.put("video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"));
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "Describe the specific process of this video")))
                .build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL_NAME)
                .messages(Arrays.asList(userMessage)).build();
        MultiModalConversationResult result = conv.call(param);
        System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            videoImageListSample();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "kimi-k2.5",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "video": [
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
            ],
            "fps":2
                 
          },
          {
            "text": "Describe the specific process of this video"
          }
        ]
      }
    ]
  }
}'

Pass a local file

The following examples show how to pass a local file. The OpenAI-compatible API supports only Base64 encoding. The DashScope SDK supports both Base64 encoding and file paths.

OpenAI compatible

To pass Base64-encoded data, construct a Data URL. For instructions, see Construct a Data URL.

Python

from openai import OpenAI
import os
import base64

# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")

client = OpenAI(
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"}, 
                },
                {"type": "text", "text": "What scene is depicted in the image?"},
            ],
        }
    ],
)
print(completion.choices[0].message.content)


# The following are examples for passing a local video file and an image list

#  [Local video file] Encode the local video as a Data URL and pass it in video_url:
#   def encode_video_to_data_url(video_path):
#       with open(video_path, "rb") as f:
#           return "data:video/mp4;base64," + base64.b64encode(f.read()).decode("utf-8")

#   video_data_url = encode_video_to_data_url("xxx/local.mp4")
#   content = [{"type": "video_url", "video_url": {"url": video_data_url}, "fps": 2}, {"type": "text", "text": "What is the content of this video?"}]

#  [Local image list] Encode multiple local images as Base64 and form a video list:
#   image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg", "xxx/f2.jpg", "xxx/f3.jpg", "xxx/f4.jpg"]]
#   content = [{"type": "video", "video": image_data_urls, "fps": 2}, {"type": "text", "text": "Describe the specific process of this video"}]

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
// Replace xxx/eagle.png with the absolute path of your local image
const base64Image = encodeImage("xxx/eagle.png")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "kimi-k2.5", 
        messages: [
            {"role": "user",
             "content": [{"type": "image_url",
                        "image_url": {"url": `data:image/png;base64,${base64Image}`},},
                        {"type": "text", "text": "What scene is depicted in the image?"}]}]
    });
    console.log(completion.choices[0].message.content);
}

main();

# The following are examples for passing a local video file and an image list

#  [Local video file] Encode the local video as a Data URL and pass it in video_url:
#   const encodeVideoToDataUrl = (videoPath) => "data:video/mp4;base64," + readFileSync(videoPath).toString("base64");
#   const videoDataUrl = encodeVideoToDataUrl("xxx/local.mp4");
#   content: [{ type: "video_url", video_url: { url: videoDataUrl }, fps: 2 }, { type: "text", text: "What is the content of this video?" }]

#  [Local image list] Encode multiple local images as Base64 and form a video list:
#   const imageDataUrls = ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"].map(p => `data:image/jpeg;base64,${encodeImage(p)}`);
#   content: [{ type: "video", video: imageDataUrls, fps: 2 }, { type: "text", text: "Describe the specific process of this video" }]

#   messages: [{"role": "user", "content": content}] 
#   Then call openai.chat.completions.create(model: "kimi-k2.5", messages: messages)

DashScope

Base64 encoding method

To use Base64 encoding, construct a Data URL. For instructions, see Construct a Data URL.

Python

import base64
import os
import dashscope 
from dashscope import MultiModalConversation

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")

messages = [
    {
        "role": "user",
        "content": [
            {"image": f"data:image/png;base64,{base64_image}"},
            {"text": "What scene is depicted in the image?"},
        ],
    },
]
response = MultiModalConversation.call(
    # If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2.5", 
    messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])

# The following are examples for passing a local video file and an image list

#  [Local video file] 
#   video_data_url = "data:video/mp4;base64," + base64.b64encode(open("xxx/local.mp4","rb").read()).decode("utf-8")
#   content: [{"video": video_data_url, "fps": 2}, {"text": "What is the content of this video?"}]

#  [Local image list] 
#   Base64: image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"]]
#   content: [{"video": image_data_urls, "fps": 2}, {"text": "Describe the specific process of this video"}]

Java

import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

   static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}

    private static String encodeToBase64(String imagePath) throws IOException {
        Path path = Paths.get(imagePath);
        byte[] imageBytes = Files.readAllBytes(path);
        return Base64.getEncoder().encodeToString(imageBytes);
    }
    

    public static void callWithLocalFile(String localPath) throws ApiException, NoApiKeyException, UploadFileException, IOException {

        String base64Image = encodeToBase64(localPath); // Base64 encoding

        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        new HashMap<String, Object>() {{ put("image", "data:image/png;base64," + base64Image); }},
                        new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
                )).build();

        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.5")
                .messages(Arrays.asList(userMessage))
                .build();

        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            // Replace xxx/eagle.png with the absolute path of your local image
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
    
    // The following are examples for passing a local video file and an image list
    //  [Local video file] 
    // String base64Image = encodeToBase64(localPath);
    // MultiModalConversation conv = new MultiModalConversation();
   //  MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
   //             .content(Arrays.asList(
   //                     new HashMap<String, Object>() {{ put("video", "data:video/mp4;base64," + base64Video;}},
   //                     new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
   //             )).build();

    //  [Local image list] 
    // List<String> urls = Arrays.asList(
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f1.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f2.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f3.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f4.jpg)); 
   //  MultiModalConversation conv = new MultiModalConversation();
   //  MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
   //             .content(Arrays.asList(
   //                     new HashMap<String, Object>() {{ put("video", urls;}},
   //                     new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
   //             )).build();

}

Local file path method

Pass the local file path directly to the model. This method is supported only by the DashScope Python and Java SDKs. It is not supported by DashScope HTTP or the OpenAI-compatible method. Refer to the following table to specify the file path based on your programming language and operating system.

Specify a file path (image example)

System

SDK

File path to pass

Example

Linux or macOS system

Python SDK

file://{absolute_path_of_the_file}

file:///home/images/test.png

Java SDK

Windows system

Python SDK

file://{absolute_path_of_the_file}

file://D:/images/test.png

Java SDK

file:///{absolute_path_of_the_file}

file:///D:/images/test.pn

Python

import os
from dashscope import MultiModalConversation
import dashscope 

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

# Replace xxx/eagle.png with the absolute path of your local image
local_path = "xxx/eagle.png"
image_path = f"file://{local_path}"
messages = [
                {'role':'user',
                'content': [{'image': image_path},
                            {'text': 'What scene is depicted in the image?'}]}]
response = MultiModalConversation.call(
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='kimi-k2.5',  
    messages=messages)
print(response.output.choices[0].message.content[0]["text"])

# The following are examples for passing a video and image list using local file paths
#  [Local video file] 
#  video_path = "file:///path/to/local.mp4"
#  content: [{"video": video_path, "fps": 2}, {"text": "What is the content of this video?"}]

#  [Local image list] 
# image_paths = ["file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg"]
# content: [{"video": image_paths, "fps": 2}, {"text": "Describe the specific process of this video"}]

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
    
    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException {
        String filePath = "file://"+localPath;
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
                        new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.5")  
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}

    public static void main(String[] args) {
        try {
            // Replace xxx/eagle.png with the absolute path of your local image
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
    
    // The following are examples for passing a video and image list using local file paths
    
    //  [Local video file] 
    //  String filePath = "file://"+localPath;
    //    MultiModalConversation conv = new MultiModalConversation();
    //    MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
    //            .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
    //                    new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();

    //  [Local image list] 
    
    //    MultiModalConversation conv = new MultiModalConversation();
    //    List<String> filePath = Arrays.asList("file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg")
    //    MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
    //            .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
    //                    new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
}

File limitations

Image limitations

  • Image resolution:

    • Minimum size: The width and height of the image must both be greater than 10 pixels.

    • Aspect ratio: The ratio of the long side to the short side of the image must not exceed 200:1.

    • Pixel limit: We recommend that you keep the image resolution within 8K (7680×4320). Images that exceed this resolution may cause API calls to time out because of large file sizes and long network transmission times.

  • Supported image formats

    • For resolutions below 4K (3840×2160), the following image formats are supported:

      Image format

      Common file extensions

      MIME Type

      BMP

      .bmp

      image/bmp

      JPEG

      .jpe, .jpeg, .jpg

      image/jpeg

      PNG

      .png

      image/png

      TIFF

      .tif, .tiff

      image/tiff

      WEBP

      .webp

      image/webp

      HEIC

      .heic

      image/heic

    • For resolutions between 4K (3840×2160) and 8K (7680×4320), only JPEG, JPG, and PNG formats are supported.

  • Image size:

    • When passing a public URL or local path: The size of a single image cannot exceed 10 MB.

    • When passing a Base64-encoded string: The size of the encoded string cannot exceed 10 MB.

    To reduce the file size, see How to compress an image or video to the required size.
  • Number of supported images: When you pass multiple images, the number of images is limited by the model's maximum input tokens. The total number of tokens for all images and text combined must be less than this limit.

Video limitations

  • Passed as an image list: Minimum of 4 images, maximum of 2000 images.

  • Passed as a video file:

    • Video size:

      • When passed as a public URL: Up to 2 GB.

      • When passed as a Base64-encoded string: Less than 10 MB.

      • When passed as a local file path: The video itself must not exceed 100 MB.

    • Video duration: 2 seconds to 1 hour.

  • Video format: MP4, AVI, MKV, MOV, FLV, WMV, etc.

  • Video resolution: No specific limit. We recommend keeping it under 2K. Higher resolutions increase processing time without improving model understanding.

  • Audio understanding: Not supported for audio in video files.

Model features

Model

Multi-turn conversation

Deep Thinking

Function calling

Structured output

Web search

Partial Mode

Context cache

kimi-k2.5

Supported

Supported

Supported

Not supported

Not supported

Not supported

Not supported

kimi-k2-thinking

Supported

Supported

Supported

Supported

Not supported

Not supported

Not supported

Moonshot-Kimi-K2-Instruct

Supported

Not supported

Supported

Not supported

Supported

Not supported

Not supported

Default parameter values

Model

enable_thinking

temperature

top_p

presence_penalty

fps

max_frames

kimi-k2.5

false

Thinking mode: 1.0

Non-thinking mode: 0.6

Thinking/non-thinking mode: 0.95

Thinking/non-thinking mode: 0.0

2

2000

kimi-k2-thinking

-

1.0

-

-

-

-

Moonshot-Kimi-K2-Instruct

-

0.6

1.0

0

-

-

A hyphen (-) indicates that there is no default value and the parameter cannot be set.

Error codes

If a model call fails and returns an error message, see Error messages to troubleshoot the issue.