All Products
Search
Document Center

Alibaba Cloud Model Studio:Kimi

Last Updated:Jun 16, 2026

This document describes how to call the Kimi model inference service deployed on Alibaba Cloud Model Studio.

Important

Moonshot-Kimi-K2-Instruct and kimi-k2-thinking will be retired on July 9, 2026. We recommend migrating to qwen3.7-plus, qwen3.7-max, or qwen3.6-flash.

Supported regions: China (Beijing), China (Hong Kong), Germany (Frankfurt), and US (Virginia).

Model experience: You can try the Kimi model in the model trial center.

Service endpoints are region-specific. Configure the correct base URL for your region.

OpenAI compatible

US (Virginia)

The base_url for SDK calls is: https://dashscope-us.aliyuncs.com/compatible-mode/v1

HTTP request URL: POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions

Germany (Frankfurt)

The base_url for SDK calls is: https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/compatible-mode/v1

HTTP request URL: POST https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions

When you make a call, replace WorkspaceId with your actual Workspace ID.

China (Beijing)

The base_url for SDK calls is: https://dashscope.aliyuncs.com/compatible-mode/v1

HTTP request URL: POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

China (Hong Kong)

The base_url for SDK calls is: https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/compatible-mode/v1

HTTP request URL: POST https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/compatible-mode/v1/chat/completions

When you make a call, replace WorkspaceId with your actual Workspace ID.

DashScope

US (Virginia)

The HTTP request URL for text models, such as kimi-k2-thinking, is POST https://dashscope-us.aliyuncs.com/api/v1/services/aigc/text-generation/generation

The HTTP request URL for multimodal models, such as kimi-k2.6 and kimi-k2.5, is POST https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

The base_url for SDK calls is:

Python code

dashscope.base_http_api_url = 'https://dashscope-us.aliyuncs.com/api/v1'

Java code

  • Method 1:

    import com.alibaba.dashscope.protocol.Protocol;
    Generation gen = new Generation(Protocol.HTTP.getValue(), “https://dashscope-us.aliyuncs.com/api/v1");
  • Method 2:

    import com.alibaba.dashscope.utils.Constants;
    Constants.baseHttpApiUrl="https://dashscope-us.aliyuncs.com/api/v1";

Germany (Frankfurt)

The HTTP request URL for text models, such as kimi-k2-thinking, is POST https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation

The HTTP request URL for multimodal models, such as kimi-k2.7-code, kimi-k2.6, and kimi-k2.5, is POST https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

When you make a call, replace WorkspaceId with your actual Workspace ID.

The base_url for SDK calls is:

Python code

When you make a call, replace WorkspaceId with your actual Workspace ID.

dashscope.base_http_api_url = 'https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/api/v1'

Java code

When you make a call, replace WorkspaceId with your actual Workspace ID.

  • Method 1:

    import com.alibaba.dashscope.protocol.Protocol;
    Generation gen = new Generation(Protocol.HTTP.getValue(), “https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/api/v1");
  • Method 2:

    import com.alibaba.dashscope.utils.Constants;
    Constants.baseHttpApiUrl="https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/api/v1";

China (Hong Kong)

The HTTP request URL for text models, such as kimi-k2-thinking, is POST https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation

The HTTP request URL for multimodal models, such as kimi-k2.7-code, kimi-k2.6, and kimi-k2.5, is POST https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

When you make a call, replace WorkspaceId with your actual Workspace ID.

The base_url for SDK calls is:

Python code

When you make a call, replace WorkspaceId with your actual Workspace ID.

dashscope.base_http_api_url = 'https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/api/v1'

Java code

When you make a call, replace WorkspaceId with your actual Workspace ID.

  • Method 1:

    import com.alibaba.dashscope.protocol.Protocol;
    Generation gen = new Generation(Protocol.HTTP.getValue(), “https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/api/v1");
  • Method 2:

    import com.alibaba.dashscope.utils.Constants;
    Constants.baseHttpApiUrl="https://{WorkspaceId}.cn-hongkong.maas.aliyuncs.com/api/v1";

China (Beijing)

The HTTP request URL for text models, such as kimi-k2-thinking, is POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation

The HTTP request URL for multimodal models, such as kimi-k2.7-code, kimi-k2.6, and kimi-k2.5, is POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

You do not need to configure the base_url for SDK calls.

Prerequisites: You must get an API key and set it as an environment variable. If you use the SDK, you must install the SDK.

Get started

The following examples use text-only input. For multimodal examples, see multimodal call.

OpenAI compatible

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "Who are you?"}],
    stream=True,
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Tracks if the main response has started.

print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    if chunk.choices:
        delta = chunk.choices[0].delta
        # Store content from the thinking process.
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
        # Start printing the main response once its content arrives.
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content

Response

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am Kimi, an AI assistant developed by Moonshot AI. I should introduce myself clearly and concisely, including:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.

I should maintain a friendly and professional tone, avoiding overly technical terms for clarity. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences to prevent misunderstandings.

Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce core capabilities
- Keep it clear and concise
====================Complete Response====================

I am Kimi, an AI assistant developed by Moonshot AI. I am based on a Mixture-of-Experts (MoE) architecture and have capabilities such as ultra-long context understanding, intelligent conversation, file processing, code generation, and complex task reasoning. How can I help you?

Node.js

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    // If not using an environment variable, replace `process.env.DASHSCOPE_API_KEY` with your API key string (e.g., "sk-xxx").
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = ''; // Complete thinking process
let answerContent = ''; // Complete response
let isAnswering = false; // Tracks if the main response has started.

async function main() {
    const messages = [{ role: 'user', content: 'Who are you?' }];

    const stream = await openai.chat.completions.create({
        model: 'kimi-k2.6',
        messages,
        stream: true,
    });

    console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');

    for await (const chunk of stream) {
        if (chunk.choices?.length) {
            const delta = chunk.choices[0].delta;
            // Store content from the thinking process.
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // Start printing the main response once its content arrives.
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    }
}

main();

Response

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am Kimi, an AI assistant developed by Moonshot AI. I should introduce myself clearly and concisely, including:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.

I should maintain a friendly and professional tone and avoid overly technical terms for clarity. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences to prevent misunderstandings.

Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce core capabilities
- Keep it clear and concise
====================Complete Response====================

I am Kimi, an AI assistant developed by Moonshot AI.

I am skilled in:
- Long-text understanding and generation
- Intelligent conversation and question answering
- File processing and analysis
- Information retrieval and integration

As an AI assistant, I do not have personal consciousness, emotions, or experiences, but I am designed to provide accurate and helpful assistance. How can I help you?

HTTP

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.6",
    "messages": [
        {
            "role": "user",
            "content": "Who are you?"
        }
    ]
}'

Response

{
    "choices": [
        {
            "message": {
                "content": "I am Kimi, an AI assistant developed by Moonshot AI. I am skilled in long-text processing, intelligent conversation, file analysis, programming assistance, and complex task reasoning. I can help you answer questions, create content, and analyze documents. How can I assist you?",
                "reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I must answer truthfully based on my actual identity.\n\nI am Kimi, an AI assistant developed by Moonshot AI. I should introduce myself clearly and concisely, including:\n1. My identity: AI assistant\n2. My developer: Moonshot AI\n3. My name: Kimi\n4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.\n\nI should maintain a friendly and professional tone while providing useful information. No need to overcomplicate; a direct answer is sufficient.",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 8,
        "completion_tokens": 183,
        "total_tokens": 191
    },
    "created": 1762753998,
    "system_fingerprint": null,
    "model": "kimi-k2.6",
    "id": "chatcmpl-485ab490-90ec-48c3-85fa-1c732b683db2"
}

DashScope

The following DashScope examples use the multimodal-generation endpoint to call kimi-k2.6, which supports both text and multimodal input. For more multimodal examples, see multimodal call.

Python

import os
from dashscope import MultiModalConversation

# Define the request messages.
messages = [{"role": "user", "content": "Who are you?"}]

completion = MultiModalConversation.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # If not using an environment variable, provide your key directly, e.g., api_key="sk-xxx"
    model="kimi-k2.6",
    messages=messages,
    result_format="message",  # Set the result format to message
    stream=True,              # Enable streaming.
    incremental_output=True,  # Enable incremental output
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Tracks if the main response has started.

print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    message = chunk.output.choices[0].message
    
    # Store content from the thinking process.
    if message.reasoning_content:
        if not is_answering:
            print(message.reasoning_content, end="", flush=True)
        reasoning_content += message.reasoning_content

    # Start printing the main response once its content arrives.
    if message.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
            is_answering = True
        print(message.content, end="", flush=True)
        answer_content += message.content

Response

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am Kimi, an AI assistant developed by Moonshot AI. I should state this clearly and concisely.

Key information to include:
1. My name: Kimi
2. My developer: Moonshot AI
3. My nature: AI assistant
4. What I can do: answer questions, assist with content creation, etc.

I should maintain a friendly and helpful tone while accurately stating my identity. I should not pretend to be human or have a personal identity.

A suitable response would be:
"I am Kimi, an AI assistant developed by Moonshot AI. I can help you with a variety of tasks such as answering questions, creating content, and analyzing documents. How can I help you?"

This response is direct, accurate, and encourages further interaction.
====================Complete Response====================

I am Kimi, an AI assistant developed by Moonshot AI. I can help you with a variety of tasks such as answering questions, creating content, and analyzing documents. How can I help you?

Java

// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import java.util.Arrays;
import java.util.Collections;

public class Main {
    public static void main(String[] args) {
        try {
            MultiModalConversation conv = new MultiModalConversation();

            MultiModalMessage userMsg = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(Collections.singletonMap("text", "Who are you?")))
                    .build();

            MultiModalConversationParam param = MultiModalConversationParam.builder()
                    // If not using an environment variable, replace the following line with your API key, e.g., .apiKey("sk-xxx")
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("kimi-k2.6")
                    .messages(Arrays.asList(userMsg))
                    .build();

            MultiModalConversationResult result = conv.call(param);

            String content = result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text");
            System.out.println("Response: " + content);
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("An exception occurred: " + e.getMessage());
        }
        System.exit(0);
    }
}

Response

====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am Kimi, an AI assistant developed by Moonshot AI. I should state this clearly and concisely.

The response should include:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, etc.

I should not pretend to be human or provide excessive technical details. A clear and friendly answer is sufficient.
====================Complete Response====================
I am Kimi, an AI assistant developed by Moonshot AI. My skills include long-text processing, intelligent conversation, question answering, content creation, and file analysis and processing. How can I assist you?

HTTP

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.6",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters": {
        "result_format": "message"
    }
}'

Response

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "content": "I am Kimi, an AI assistant developed by Moonshot AI. I can help you answer questions, create content, analyze documents, and write code. How can I help you?",
                    "reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am Kimi, an AI assistant developed by Moonshot AI. I should state this clearly and concisely.\n\nKey information to include:\n1. My name: Kimi\n2. My developer: Moonshot AI\n3. My nature: AI assistant\n4. What I can do: answer questions, assist with content creation, etc.\n\nThe response should be friendly, direct, and easy to understand.",
                    "role": "assistant"
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 9,
        "output_tokens": 156,
        "total_tokens": 165
    },
    "request_id": "709a0697-ed1f-4298-82c9-a4b878da1849"
}

Multimodal calls

The kimi-k2.7-code, kimi-k2.6, and kimi-k2.5 models can simultaneously process text, images, or video. Use the enable_thinking parameter to enable thinking mode. The following examples show how to use this capability.

Enable or disable thinking mode

kimi-k2.6 and kimi-k2.5 are hybrid thinking models. These models can reply after thinking or reply directly. You can use the enable_thinking parameter to control whether to enable the thinking mode:

  • true: Enable thinking mode

  • false (default): Disables the thinking mode

kimi-k2.7-code is a thinking-only model: thinking mode is always enabled (enable_thinking defaults to true and cannot be disabled), and preserve_thinking defaults to true.

kimi-k2.6 supports passing the thinking process in multi-turn conversations by using the preserve_thinking parameter. For more information, see Pass the thinking process.

The following examples show how to use an image URL and enable thinking mode. The main example demonstrates single-image input, while the commented-out code is an example of multi-image input.

OpenAI compatible

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

# Single-image input example (thinking mode enabled)
completion = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What scene is depicted in the image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                }
            ]
        }
    ],
    extra_body={"enable_thinking":True}  # Enable thinking mode
)

# Print the thinking process
if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    print(completion.choices[0].message.reasoning_content)

# Print the complete response
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(completion.choices[0].message.content)

# Multi-image input example (thinking mode enabled, uncomment to use)
# completion = client.chat.completions.create(
#     model="kimi-k2.6",
#     messages=[
#         {
#             "role": "user",
#             "content": [
#                 {"type": "text", "text": "What do these images depict?"},
#                 {
#                     "type": "image_url",
#                     "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
#                 }
#             ]
#         }
#     ],
#     extra_body={"enable_thinking":True}
# )
#
# # Print the thinking process and complete response
# if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
#     print("\nThinking Process:\n" + completion.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + completion.choices[0].message.content)

Node.js

import OpenAI from "openai";
import process from 'process';

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

// Single-image input example (thinking mode enabled)
const completion = await openai.chat.completions.create({
    model: 'kimi-k2.6',
    messages: [
        {
            role: 'user',
            content: [
                { type: 'text', text: 'What scene is depicted in the image?' },
                {
                    type: 'image_url',
                    image_url: {
                        url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg'
                    }
                }
            ]
        }
    ],
    enable_thinking: true  // Enable thinking mode
});

// Print the thinking process
if (completion.choices[0].message.reasoning_content) {
    console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
    console.log(completion.choices[0].message.reasoning_content);
}

// Print the complete response
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.content);

// Multi-image input example (thinking mode enabled, uncomment to use)
// const multiCompletion = await openai.chat.completions.create({
//     model: 'kimi-k2.6',
//     messages: [
//         {
//             role: 'user',
//             content: [
//                 { type: 'text', text: 'What do these images depict?' },
//                 {
//                     type: 'image_url',
//                     image_url: { url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg' }
//                 },
//                 {
//                     type: 'image_url',
//                     image_url: { url: 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png' }
//                 }
//             ]
//         }
//     ],
//     enable_thinking: true
// });
//
// // Print the thinking process and complete response
// if (multiCompletion.choices[0].message.reasoning_content) {
//     console.log('\nThinking Process:\n' + multiCompletion.choices[0].message.reasoning_content);
// }
// console.log('\nComplete Response:\n' + multiCompletion.choices[0].message.content);

Curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.6",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What scene is depicted in the image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                }
            ]
        }
    ],
    "enable_thinking": true
}'

# Multi-image input example (uncomment to use)
# curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
#     "model": "kimi-k2.6",
#     "messages": [
#         {
#             "role": "user",
#             "content": [
#                 {
#                     "type": "text",
#                     "text": "What do these images depict?"
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {
#                         "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
#                     }
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {
#                         "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
#                     }
#                 }
#             ]
#         }
#     ],
#     "enable_thinking": true,
#     "stream": false
# }'

DashScope

Python

import os
from dashscope import MultiModalConversation

# Single-image input example (thinking mode enabled)
response = MultiModalConversation.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {"text": "What scene is depicted in the image?"},
                {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
            ]
        }
    ],
    enable_thinking=True  # Enable thinking mode
)

# Print the thinking process
if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    print(response.output.choices[0].message.reasoning_content)

# Print the complete response
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(response.output.choices[0].message.content[0]["text"])

# Multi-image input example (thinking mode enabled, uncomment to use)
# response = MultiModalConversation.call(
#     api_key=os.getenv("DASHSCOPE_API_KEY"),
#     model="kimi-k2.6",
#     messages=[
#         {
#             "role": "user",
#             "content": [
#                 {"text": "What do these images depict?"},
#                 {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
#                 {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
#             ]
#         }
#     ],
#     enable_thinking=True
# )
#
# # Print the thinking process and complete response
# if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
#     print("\nThinking Process:\n" + response.output.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + response.output.choices[0].message.content[0]["text"])

Java

// Requires DashScope SDK v2.19.4 or later.
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

public class KimiK26MultiModalExample {
    public static void main(String[] args) {
        try {
            // Single-image input example (thinking mode enabled)
            MultiModalConversation conv = new MultiModalConversation();

            // Build the message content
            Map<String, Object> textContent = new HashMap<>();
            textContent.put("text", "What scene is depicted in the image?");

            Map<String, Object> imageContent = new HashMap<>();
            imageContent.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");

            MultiModalMessage userMessage = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(textContent, imageContent))
                    .build();

            // Build the request parameters
            MultiModalConversationParam param = MultiModalConversationParam.builder()
                    // If the environment variable is not set, replace this with your API key from Model Studio.
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("kimi-k2.6")
                    .messages(Arrays.asList(userMessage))
                    .enableThinking(true)  // Enable thinking mode
                    .build();

            // Call the model
            MultiModalConversationResult result = conv.call(param);

            // Print the response
            String content = result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text");
            System.out.println("Response: " + content);

            // If thinking mode is enabled, print the thinking process
            if (result.getOutput().getChoices().get(0).getMessage().getReasoningContent() != null) {
                System.out.println("\nThinking Process: " +
                    result.getOutput().getChoices().get(0).getMessage().getReasoningContent());
            }

            // Multi-image input example (uncomment to use)
            // Map<String, Object> imageContent1 = new HashMap<>();
            // imageContent1.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");
            // Map<String, Object> imageContent2 = new HashMap<>();
            // imageContent2.put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png");
            //
            // Map<String, Object> textContent2 = new HashMap<>();
            // textContent2.put("text", "What do these images depict?");
            //
            // MultiModalMessage multiImageMessage = MultiModalMessage.builder()
            //         .role(Role.USER.getValue())
            //         .content(Arrays.asList(textContent2, imageContent1, imageContent2))
            //         .build();
            //
            // MultiModalConversationParam multiParam = MultiModalConversationParam.builder()
            //         .apiKey(System.getenv("DASHSCOPE_API_KEY"))
            //         .model("kimi-k2.6")
            //         .messages(Arrays.asList(multiImageMessage))
            //         .enableThinking(true)
            //         .build();
            //
            // MultiModalConversationResult multiResult = conv.call(multiParam);
            // System.out.println(multiResult.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));

        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("Call failed: " + e.getMessage());
        }
    }
}

Curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.6",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "What scene is depicted in the image?"
                    },
                    {
                        "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                ]
            }
        ]
    },
    "parameters": {
        "enable_thinking": true
    }
}'

# Multi-image input example (uncomment to use)
# curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
#     "model": "kimi-k2.6",
#     "input": {
#         "messages": [
#             {
#                 "role": "user",
#                 "content": [
#                     {
#                         "text": "What do these images depict?"
#                     },
#                     {
#                         "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
#                     },
#                     {
#                         "image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
#                     }
#                 ]
#             }
#         ]
#     },
#     "parameters": {
#         "enable_thinking": true
#     }
# }'

Video understanding

Video file

The kimi-k2.7-code, kimi-k2.6, and kimi-k2.5 models analyze videos by extracting a sequence of frames. You can control the frame extraction strategy with the following parameters:

  • fps: Controls the frame extraction frequency. The interval between extracted frames is seconds. The value must be in the range of [0.1, 10]. The default value is 2.0.

    • For high-motion scenes: Set a higher fps value to capture more detail.

    • For static or long videos: Set a lower fps value to improve processing efficiency.

  • max_frames: Specifies the maximum number of frames to extract from a video. The default and maximum value is 2000.

    If the number of frames calculated from the fps value exceeds this limit, the system automatically extracts frames uniformly to stay within the max_frames limit. This parameter is available only when you use the DashScope SDK.

OpenAI compatible

When passing a video file to the model using the OpenAI SDK or an HTTP request, set the "type" parameter in the user message to "video_url".

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                # When passing a video file directly, set the "type" parameter to "video_url".
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                    "fps": 2
                },
                {
                    "type": "text",
                    "text": "What is the content of this video?"
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "kimi-k2.6",
        messages: [
            {
                role: "user",
                content: [
                    // When passing a video file directly, set the "type" parameter to "video_url".
                    {
                        type: "video_url",
                        video_url: {
                            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                        },
                        "fps": 2
                    },
                    {
                        type: "text",
                        text: "What is the content of this video?"
                    }
                ]
            }
        ]
    });

    console.log(response.choices[0].message.content);
}

main();

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "video_url",
            "video_url": {
              "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
            },
            "fps":2
          },
          {
            "type": "text",
            "text": "What is the content of this video?"
          }
        ]
      }
    ]
  }'

DashScope

Python

import dashscope
import os

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

messages = [
    {"role": "user",
        "content": [
            # The fps parameter sets the frame extraction frequency; the interval between frames is 1/fps seconds.
            {"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "What is the content of this video?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # If the DASHSCOPE_API_KEY environment variable is not set, replace this line with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='kimi-k2.6',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // The fps parameter sets the frame extraction frequency; the interval between frames is 1/fps seconds.
        Map<String, Object> params = new HashMap<>();
        params.put("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4");
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "What is the content of this video?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.6")
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "kimi-k2.6",
    "input":{
        "messages":[
            {"role": "user","content": [{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "What is the content of this video?"}]}]}
}'

Image list

When you provide a video as an image list (pre-extracted frames), use the fps parameter to specify the original video's frame extraction rate. This value indicates that the frames were extracted every seconds, allowing the model to better understand the sequence of events, duration, and dynamic changes.

OpenAI compatible

When passing a video as an image list using the OpenAI SDK or an HTTP request, set the "type" parameter in the user message to "video".

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2.6", 
    messages=[{"role": "user","content": [
        # When passing an image list, set the "type" parameter in the user message to "video".
         {"type": "video","video": [
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
         "fps":2},
         {"type": "text","text": "Describe the action in this video."},
    ]}]
)

print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "kimi-k2.6",  
        messages: [{
            role: "user",
            content: [
                {
                    // When passing an image list, set the "type" parameter in the user message to "video".
                    type: "video",
                    video: [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                        "fps":2
                },
                {
                    type: "text",
                    text: "Describe the action in this video."
                }
            ]
        }]
    });
    console.log(response.choices[0].message.content);
}

main();

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "kimi-k2.6",
    "messages": [{"role": "user","content": [{"type": "video","video": [
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                  "fps":2},
                {"type": "text","text": "Describe the action in this video."}]}]
}'

DashScope

Python

import os
import dashscope

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

messages = [{"role": "user",
             "content": [
                 {"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                   "fps":2},
                 {"text": "Describe the action in this video."}]}]
response = dashscope.MultiModalConversation.call(
    # If the DASHSCOPE_API_KEY environment variable is not set, replace this line with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='kimi-k2.6', 
    messages=messages
)
print(response.output.choices[0].message.content[0]["text"])

Java

// Requires DashScope SDK v2.21.10 or later.
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}

    private static final String MODEL_NAME = "kimi-k2.6"; 
    public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> params = new HashMap<>();
        params.put("video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"));
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "Describe the action in this video.")))
                .build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL_NAME)
                .messages(Arrays.asList(userMessage)).build();
        MultiModalConversationResult result = conv.call(param);
        System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            videoImageListSample();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "kimi-k2.6",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "video": [
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
            ],
            "fps":2
                 
          },
          {
            "text": "Describe the action in this video."
          }
        ]
      }
    ]
  }
}'

Pass a local file

The following examples show how to pass a local file. The OpenAI-compatible API supports only Base64 encoding, while DashScope supports both Base64 encoding and file paths.

OpenAI compatible

To pass a local file using Base64 encoding, construct a Data URL. For instructions, see Construct a Data URL.

Python

from openai import OpenAI
import os
import base64

# Encoding function: Converts a local file to a Base64-encoded string.
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Replace "xxx/eagle.png" with the absolute path to your local image.
base64_image = encode_image("xxx/eagle.png")

client = OpenAI(
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"}, 
                },
                {"type": "text", "text": "What scene is depicted in the image?"},
            ],
        }
    ],
)
print(completion.choices[0].message.content)


# The following examples show how to pass a local video file and a local image list.

# [Local video file] Encode the local video as a Data URL and pass it to the video_url parameter:
#   def encode_video_to_data_url(video_path):
#       with open(video_path, "rb") as f:
#           return "data:video/mp4;base64," + base64.b64encode(f.read()).decode("utf-8")

#   video_data_url = encode_video_to_data_url("xxx/local.mp4")
#   content = [{"type": "video_url", "video_url": {"url": video_data_url}, "fps": 2}, {"type": "text", "text": "What is the content of this video?"}]

# [Local image list] Encode multiple local images with Base64 and pass them as a list to the video parameter:
#   image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg", "xxx/f2.jpg", "xxx/f3.jpg", "xxx/f4.jpg"]]
#   content = [{"type": "video", "video": image_data_urls, "fps": 2}, {"type": "text", "text": "Describe the sequence of events in this video."}]

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
// Replace "xxx/eagle.png" with the absolute path to your local image.
const base64Image = encodeImage("xxx/eagle.png")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "kimi-k2.6", 
        messages: [
            {"role": "user",
             "content": [{"type": "image_url",
                        "image_url": {"url": `data:image/png;base64,${base64Image}`},},
                        {"type": "text", "text": "What scene is depicted in the image?"}]}]
    });
    console.log(completion.choices[0].message.content);
}

main();

// The following examples show how to pass a local video file and a local image list.

// [Local video file] Encode the local video as a Data URL and pass it to the video_url parameter:
//   const encodeVideoToDataUrl = (videoPath) => "data:video/mp4;base64," + readFileSync(videoPath).toString("base64");
//   const videoDataUrl = encodeVideoToDataUrl("xxx/local.mp4");
//   content: [{ type: "video_url", video_url: { url: videoDataUrl }, fps: 2 }, { type: "text", text: "What is the content of this video?" }]

// [Local image list] Encode multiple local images with Base64 and pass them as a list to the video parameter:
//   const imageDataUrls = ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"].map(p => `data:image/jpeg;base64,${encodeImage(p)}`);
//   content: [{ type: "video", video: imageDataUrls, fps: 2 }, { type: "text", text: "Describe the sequence of events in this video." }]

//   messages: [{"role": "user", "content": content}] 
//   Then call openai.chat.completions.create({model: "kimi-k2.6", messages: messages})

DashScope

Base64 encoding

To pass a local file using Base64 encoding, construct a Data URL. For instructions, see Construct a Data URL.

Python

import base64
import os
import dashscope 
from dashscope import MultiModalConversation

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

# Encoding function: Converts a local file to a Base64-encoded string.
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Replace "xxx/eagle.png" with the absolute path to your local image.
base64_image = encode_image("xxx/eagle.png")

messages = [
    {
        "role": "user",
        "content": [
            {"image": f"data:image/png;base64,{base64_image}"},
            {"text": "What scene is depicted in the image?"},
        ],
    },
]
response = MultiModalConversation.call(
    # If the DASHSCOPE_API_KEY environment variable is not set, pass your Model Studio API key directly, for example: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2.6", 
    messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])

# The following examples show how to pass a local video file and a local image list.

# [Local video file]
#   video_data_url = "data:video/mp4;base64," + base64.b64encode(open("xxx/local.mp4","rb").read()).decode("utf-8")
#   content: [{"video": video_data_url, "fps": 2}, {"text": "What is the content of this video?"}]

# [Local image list]
#   image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"]]
#   content: [{"video": image_data_urls, "fps": 2}, {"text": "Describe the sequence of events in this video."}]

Java

import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

   static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}

    private static String encodeToBase64(String imagePath) throws IOException {
        Path path = Paths.get(imagePath);
        byte[] imageBytes = Files.readAllBytes(path);
        return Base64.getEncoder().encodeToString(imageBytes);
    }
    

    public static void callWithLocalFile(String localPath) throws ApiException, NoApiKeyException, UploadFileException, IOException {

        String base64Image = encodeToBase64(localPath);

        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        new HashMap<String, Object>() {{ put("image", "data:image/png;base64," + base64Image); }},
                        new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
                )).build();

        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.6")
                .messages(Arrays.asList(userMessage))
                .build();

        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            // Replace "xxx/eagle.png" with the absolute path to your local image.
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
    
    // The following examples show how to pass a local video file and a local image list.
    // [Local video file]
    // String base64Image = encodeToBase64(localPath);
    // MultiModalConversation conv = new MultiModalConversation();
   //  MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
   //             .content(Arrays.asList(
   //                     new HashMap<String, Object>() {{ put("video", "data:video/mp4;base64," + base64Video;}},
   //                     new HashMap<String, Object>() {{ put("text", "What scene is depicted in this video?"); }}
   //             )).build();

    // [Local image list]
    // List<String> urls = Arrays.asList(
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f1.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f2.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f3.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f4.jpg)); 
   //  MultiModalConversation conv = new MultiModalConversation();
   //  MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
   //             .content(Arrays.asList(
   //                     new HashMap<String, Object>() {{ put("video", urls;}},
   //                     new HashMap<String, Object>() {{ put("text", "What scene is depicted in this video?"); }}
   //             )).build();

}

File path

You can pass a local file path directly to the model. This method is supported only by the DashScope Python and Java SDKs; it is not available for DashScope HTTP or the OpenAI-compatible API. The table below shows the required file path format for each programming language and operating system.

Specify a file path (image example)

System

SDK

Path format

Example

Linux or macOS

Python SDK

file://{absolute path of the file}

file:///home/images/test.png

Java SDK

Windows

Python SDK

file://{absolute path of the file}

file://D:/images/test.png

Java SDK

file:///{absolute path of the file}

file:///D:/images/test.png

Python

import os
from dashscope import MultiModalConversation
import dashscope 

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

# Replace "xxx/eagle.png" with the absolute path to your local image.
local_path = "xxx/eagle.png"
image_path = f"file://{local_path}"
messages = [
                {'role':'user',
                'content': [{'image': image_path},
                            {'text': 'What scene is depicted in the image?'}]}]
response = MultiModalConversation.call(
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='kimi-k2.6',  
    messages=messages)
print(response.output.choices[0].message.content[0]["text"])

# The following examples show how to pass a local video and a list of local images using file paths.
# [Local video file]
#  video_path = "file:///path/to/local.mp4"
#  content: [{"video": video_path, "fps": 2}, {"text": "What is the content of this video?"}]

# [Local image list]
# image_paths = ["file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg"]
# content: [{"video": image_paths, "fps": 2}, {"text": "Describe the sequence of events in this video."}]

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
    
    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException {
        String filePath = "file://"+localPath;
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
                        new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.6")  
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}

    public static void main(String[] args) {
        try {
            // Replace "xxx/eagle.png" with the absolute path to your local image.
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
    
    // The following examples show how to pass a local video and a list of local images using file paths.
    
    // [Local video file]
    //  String filePath = "file://"+localPath;
    //    MultiModalConversation conv = new MultiModalConversation();
    //    MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
    //            .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
    //                    new HashMap<String, Object>(){{put("text", "What scene is depicted in the video?");}})).build();

    // [Local image list]
    
    //    MultiModalConversation conv = new MultiModalConversation();
    //    List<String> filePath = Arrays.asList("file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg")
    //    MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
    //            .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
    //                    new HashMap<String, Object>(){{put("text", "What scene is depicted in the video?");}})).build();
}

File limitations

Image limitations

  • Image resolution:

    • Minimum size: Width and height must each exceed 10 pixels.

    • Aspect ratio: The ratio of the longest side to the shortest side must not exceed 200:1.

    • Maximum resolution: The recommended maximum is 8K(7680x4320). Higher resolutions may cause API call timeouts due to large file sizes or slow network transfers.

  • Supported image formats

    • The following formats are supported for resolutions below 4K(3840x2160):

      Image format

      File extension

      MIME type

      BMP

      .bmp

      image/bmp

      JPEG

      .jpe, .jpeg, .jpg

      image/jpeg

      PNG

      .png

      image/png

      TIFF

      .tif, .tiff

      image/tiff

      WEBP

      .webp

      image/webp

      HEIC

      .heic

      image/heic

    • For resolutions between 4K(3840x2160) and 8K(7680x4320), only JPEG, JPG, and PNG are supported.

  • Image size:

    • When providing an image via a public URL or local path, its size must not exceed 10 MB.

    • When using Base64 encoding, the encoded string must not exceed 10 MB.

    To compress a file, see How to compress an image or video to meet the size limit.
  • Number of supported images: When providing multiple images, the total number of tokens for all images and text must not exceed the model's maximum input limit.

Video limitations

  • As an image list: 4 to 2,000 images.

  • As a video file:

    • Video size:

      • Via public URL: Up to 2 GB.

      • Via Base64 encoding: the encoded string must be less than 10 MB.

      • Via local file path: Up to 100 MB.

    • Video duration: 2 seconds to 1 hour.

  • Video format: Supported formats include MP4, AVI, MKV, MOV, FLV, and WMV.

  • Video resolution: While there is no strict resolution limit, use 2K or lower for best results. Higher resolutions increase processing time without improving model understanding.

  • Audio understanding: The model does not process the audio track in video files.

Other features

Model

Multi-turn conversation

Deep thinking

Function calling

Structured output

Web search

Prefix completion

Context cache

kimi-k2.7-code

Supported

Supported

Supported

Not supported

Not supported

Not supported

Supported

kimi-k2.6

Supported

Supported

Supported

Not supported

Not supported

Not supported

Supported

kimi-k2.5

Supported

Supported

Supported

Not supported

Not supported

Not supported

Supported

kimi-k2-thinking

Supported

Supported

Supported

Supported

Not supported

Not supported

Supported

Moonshot-Kimi-K2-Instruct

Supported

Not supported

Supported

Not supported

Supported

Not supported

Supported

Default parameters

Model

enable_thinking

temperature

top_p

presence_penalty

fps

max_frames

kimi-k2.7-code

true (thinking mode only)

1.0

0.95

0.0

2

2000

kimi-k2.6

false

thinking mode: 1.0

non-thinking mode: 0.6

Both modes: 0.95

Both modes: 0.0

2

2000

kimi-k2.5

false

thinking mode: 1.0

non-thinking mode: 0.6

Both modes: 0.95

Both modes: 0.0

2

2000

kimi-k2-thinking

-

1.0

-

-

-

-

Moonshot-Kimi-K2-Instruct

-

0.6

1.0

0

-

-

A hyphen (-) indicates that the parameter is not applicable.

Models and billing

The Kimi series are large language models from Moonshot AI.

  • kimi-k2.7-code: The most capable Kimi model for coding. It follows long-context instructions more reliably and achieves higher success rates on programming tasks. Supports text, image, and video input, thinking mode, conversation, and agent tasks.

  • kimi-k2.6: The newest and most capable model in the Kimi series. It offers improved performance in long-horizon coding, instruction following, and self-correction. Supports text, image, and video input, thinking and non-thinking modes, conversation, and agent tasks.

  • kimi-k2.5: It achieves state-of-the-art (SOTA) performance on open-source benchmarks for agent tasks, code generation, visual understanding, and other general intelligence tasks. Supports image, video, and text input, thinking and non-thinking modes, conversation, and agent tasks.

  • kimi-k2-thinking: Supports deep thinking mode only. It exposes the reasoning process through the reasoning_content field. It excels at coding and tool calling, and is suitable for use cases that require logical analysis, planning, or deep understanding.

  • Moonshot-Kimi-K2-Instruct: Does not support deep thinking. It generates responses with lower latency, and is suitable for use cases that need fast, direct answers.

For kimi-k2.7-code pricing, see model invocation billing.

For pricing and context window details, see the Model Studio console.

Billing is based on input and output token counts.

In thinking mode, the chain of thought counts as output tokens.

Error codes

If a model call fails and returns an error message, see Error codes.