Kimi - Alibaba Cloud Model Studio

Topik ini menjelaskan cara memanggil model seri Kimi di platform Alibaba Cloud Model Studio menggunakan API yang kompatibel dengan OpenAI atau SDK DashScope.

Penting

Dokumen ini hanya berlaku untuk wilayah China (Beijing). Untuk menggunakan model, Anda harus menggunakan API key dari wilayah China (Beijing).

Pengenalan model

Model seri Kimi adalah large language models (LLMs) yang dikembangkan oleh Moonshot AI.

kimi-k2.5: Model Kimi paling cerdas hingga saat ini, mencapai performa SoTA open-source dalam Agent, kode, pemahaman visual, dan berbagai tugas kecerdasan umum. Ini juga merupakan model Kimi paling serbaguna hingga kini, dengan arsitektur multimodal native yang mendukung input teks maupun visual, mode berpikir dan non-berpikir, serta tugas dialog dan Agent.
kimi-k2-thinking: Hanya mendukung mode deep thinking. Proses penalarannya ditampilkan di bidang reasoning_content. Model ini unggul dalam coding dan tool calling, sehingga cocok untuk skenario yang memerlukan analisis logis, perencanaan, atau pemahaman mendalam.
Moonshot-Kimi-K2-Instruct: Tidak mendukung deep thinking. Menghasilkan respons secara langsung untuk performa lebih cepat, sehingga cocok untuk skenario yang memerlukan jawaban cepat dan langsung.

Model	Mode	Context window	Max input	Max CoT	Max response	Input cost	Output cost
Model	Mode	(tokens)				(per 1M tokens)
kimi-k2.5	Thinking mode	262.144	258.048	32.768	32.768	$0,574	$3,011
kimi-k2.5	Non-thinking mode	262.144	260.096	-	32.768	$0,574	$3,011
kimi-k2-thinking	Thinking mode	262.144	229.376	32.768	16.384	$0,574	$2,294
Moonshot-Kimi-K2-Instruct	Non-thinking mode	131.072	131.072	-	8.192	$0,574	$2,294

Contoh generasi teks

Sebelum menggunakan API, Anda harus mendapatkan API key dan menyetel API key sebagai variabel lingkungan. Jika Anda melakukan panggilan menggunakan SDK, instal SDK.

OpenAI compatible

Python

Kode contoh

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2-thinking",
    messages=[{"role": "user", "content": "Who are you?"}],
    stream=True,
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Indicates whether the model has started generating the response

print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    if chunk.choices:
        delta = chunk.choices[0].delta
        # Collect only the thinking content
        if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
            if not is_answering:
                print(delta.reasoning_content, end="", flush=True)
            reasoning_content += delta.reasoning_content
        # When content is received, start generating the response
        if hasattr(delta, "content") and delta.content:
            if not is_answering:
                print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                is_answering = True
            print(delta.content, end="", flush=True)
            answer_content += delta.content

Respons

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.

I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences.

Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================

I am an AI assistant named Kimi, developed by Moonshot AI. I am based on a Mixture-of-Experts (MoE) architecture and have capabilities such as long-context understanding, intelligent conversation, file processing, code generation, and complex task reasoning. How can I help you?

Node.js

Kode contoh

import OpenAI from "openai";
import process from 'process';

// Initialize the OpenAI client
const openai = new OpenAI({
    // If the environment variable is not set, replace this with your Model Studio API key: apiKey: "sk-xxx"
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = ''; // Complete thinking process
let answerContent = ''; // Complete response
let isAnswering = false; // Indicates whether the model has started generating the response

async function main() {
    const messages = [{ role: 'user', content: 'Who are you?' }];

    const stream = await openai.chat.completions.create({
        model: 'kimi-k2-thinking',
        messages,
        stream: true,
    });

    console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');

    for await (const chunk of stream) {
        if (chunk.choices?.length) {
            const delta = chunk.choices[0].delta;
            // Collect only the thinking content
            if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
                if (!isAnswering) {
                    process.stdout.write(delta.reasoning_content);
                }
                reasoningContent += delta.reasoning_content;
            }

            // When content is received, start generating the response
            if (delta.content !== undefined && delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    }
}

main();

Respons

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.

I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can easily understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences to avoid misunderstandings.

Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================

I am an AI assistant named Kimi, developed by Moonshot AI.

I am good at:
- Long-text understanding and generation
- Intelligent conversation and Q&A
- File processing and analysis
- Information retrieval and integration

As an AI assistant, I do not have personal consciousness, emotions, or experiences, but I will do my best to provide you with accurate and helpful assistance. How can I help you?

HTTP

Kode contoh

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2-thinking",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ]
}'

Respons

{
    "choices": [
        {
            "message": {
                "content": "I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, file analysis, programming assistance, and complex task reasoning. I can help you answer questions, create content, and analyze documents. How can I help you?",
                "reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:\n1. My identity: AI assistant\n2. My developer: Moonshot AI\n3. My name: Kimi\n4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.\n\nI should maintain a friendly and professional tone while providing useful information. No need to overcomplicate it; a direct answer is sufficient.",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 8,
        "completion_tokens": 183,
        "total_tokens": 191
    },
    "created": 1762753998,
    "system_fingerprint": null,
    "model": "kimi-k2-thinking",
    "id": "chatcmpl-485ab490-90ec-48c3-85fa-1c732b683db2"
}

DashScope

Python

Kode contoh

import os
from dashscope import Generation

# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]

completion = Generation.call(
    # If the environment variable is not set, replace this with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2-thinking",
    messages=messages,
    result_format="message",  # Set the result format to message
    stream=True,              # Enable streaming output
    incremental_output=True,  # Enable incremental output
)

reasoning_content = ""  # Complete thinking process
answer_content = ""     # Complete response
is_answering = False    # Indicates whether the model has started generating the response

print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
    message = chunk.output.choices[0].message
    
    # Collect only the thinking content
    if message.reasoning_content:
        if not is_answering:
            print(message.reasoning_content, end="", flush=True)
        reasoning_content += message.reasoning_content

    # When content is received, start generating the response
    if message.content:
        if not is_answering:
            print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
            is_answering = True
        print(message.content, end="", flush=True)
        answer_content += message.content

# After the loop, the reasoning_content and answer_content variables contain the complete content.
# You can perform further processing here as needed.
# print(f"\n\nComplete thinking process:\n{reasoning_content}")
# print(f"\nComplete response:\n{answer_content}")

Respons

====================Thinking Process====================

The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.

Key information to include the following:
1. My name: Kimi
2. My developer: Moonshot AI
3. My nature: Artificial intelligence assistant
4. What I can do: Answer questions, assist with creation, etc.

I should maintain a friendly and helpful tone while accurately stating my identity. I should not pretend to be human or have a personal identity.

A suitable response could be:
"I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?"

This response is direct, accurate, and invites further interaction.
====================Complete Response====================

I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?

Java

Kode contoh

// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(GenerationResult message) {
        String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();

        if (reasoning!= null&&!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================Thinking Process====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (content!= null&&!content.isEmpty()) {
            finalContent.append(content);
            if (!isFirstPrint) {
                System.out.println("\n====================Complete Response====================");
                isFirstPrint = true;
            }
            System.out.print(content);
        }
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2-thinking")
                .incrementalOutput(true)
                .resultFormat("message")
                .messages(Arrays.asList(userMsg))
                .build();
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
    }

    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
            streamCallWithMessage(gen, userMsg);
            // Print the final result
            // if (reasoningContent.length() > 0) {
            //     System.out.println("\n====================Complete Response====================");
            //     System.out.println(finalContent.toString());
            // }
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

Respons

====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.

I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.

The response should include the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, etc.

I should not pretend to be human or provide excessive technical details. A clear and friendly answer is sufficient.
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, answering questions, assisting with creation, and helping you analyze and process files. How can I help you?

HTTP

Kode contoh

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2-thinking",
    "input":{
        "messages":[      
            {
                "role": "user",
                "content": "Who are you?"
            }
        ]
    },
    "parameters": {
        "result_format": "message"
    }
}'

Respons

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "content": "I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you answer questions, create content, analyze documents, and write code. How can I help you?",
                    "reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.\n\nKey information to include the following:\n1. My name: Kimi\n2. My developer: Moonshot AI\n3. My nature: Artificial intelligence assistant\n4. What I can do: Answer questions, assist with creation, etc.\n\nI should respond in a friendly and direct manner that is easy for the user to understand.",
                    "role": "assistant"
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 9,
        "output_tokens": 156,
        "total_tokens": 165
    },
    "request_id": "709a0697-ed1f-4298-82c9-a4b878da1849"
}

kimi-k2.5 contoh multimodal

kimi-k2.5 dapat memproses input teks, gambar, atau video secara bersamaan.

Aktifkan atau nonaktifkan mode berpikir

kimi-k2.5 adalah model berpikir hibrida yang dapat merespons baik setelah proses berpikir maupun secara langsung. Kendalikan perilaku ini menggunakan parameter enable_thinking:

true
false (default)

Contoh berikut menunjukkan cara menggunakan URL gambar dan mengaktifkan mode berpikir. Contoh utama menggunakan satu gambar, sedangkan kode yang dikomentari menunjukkan input multi-gambar.

OpenAI compatible

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

# Example of passing a single image (thinking mode enabled)
completion = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What scene is depicted in the image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                }
            ]
        }
    ],
    extra_body={"enable_thinking":True}  # Enable thinking mode
)

# Print the thinking process
if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    print(completion.choices[0].message.reasoning_content)

# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(completion.choices[0].message.content)

# Example of passing multiple images (thinking mode enabled, uncomment to use)
# completion = client.chat.completions.create(
#     model="kimi-k2.5",
#     messages=[
#         {
#             "role": "user",
#             "content": [
#                 {"type": "text", "text": "What do these images depict?"},
#                 {
#                     "type": "image_url",
#                     "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
#                 }
#             ]
#         }
#     ],
#     extra_body={"enable_thinking":True}
# )
#
# # Print the thinking process and response
# if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
#     print("\nThinking Process:\n" + completion.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + completion.choices[0].message.content)

Node.js

import OpenAI from "openai";
import process from 'process';

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

// Example of passing a single image (thinking mode enabled)
const completion = await openai.chat.completions.create({
    model: 'kimi-k2.5',
    messages: [
        {
            role: 'user',
            content: [
                { type: 'text', text: 'What scene is depicted in the image?' },
                {
                    type: 'image_url',
                    image_url: {
                        url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg'
                    }
                }
            ]
        }
    ],
    enable_thinking: true  // Enable thinking mode
});

// Print the thinking process
if (completion.choices[0].message.reasoning_content) {
    console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
    console.log(completion.choices[0].message.reasoning_content);
}

// Print the response content
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.content);

// Example of passing multiple images (thinking mode enabled, uncomment to use)
// const multiCompletion = await openai.chat.completions.create({
//     model: 'kimi-k2.5',
//     messages: [
//         {
//             role: 'user',
//             content: [
//                 { type: 'text', text: 'What do these images depict?' },
//                 {
//                     type: 'image_url',
//                     image_url: { url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg' }
//                 },
//                 {
//                     type: 'image_url',
//                     image_url: { url: 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png' }
//                 }
//             ]
//         }
//     ],
//     enable_thinking: true
// });
//
// // Print the thinking process and response
// if (multiCompletion.choices[0].message.reasoning_content) {
//     console.log('\nThinking Process:\n' + multiCompletion.choices[0].message.reasoning_content);
// }
// console.log('\nComplete Response:\n' + multiCompletion.choices[0].message.content);

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.5",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What scene is depicted in the image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                }
            ]
        }
    ],
    "enable_thinking": true
}'

# Multi-image input example (uncomment to use)
# curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
#     "model": "kimi-k2.5",
#     "messages": [
#         {
#             "role": "user",
#             "content": [
#                 {
#                     "type": "text",
#                     "text": "What do these images depict?"
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {
#                         "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
#                     }
#                 },
#                 {
#                     "type": "image_url",
#                     "image_url": {
#                         "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
#                     }
#                 }
#             ]
#         }
#     ],
#     "enable_thinking": true,
#     "stream": false
# }'

DashScope

Python

import os
from dashscope import MultiModalConversation

# Example of passing a single image (thinking mode enabled)
response = MultiModalConversation.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {"text": "What scene is depicted in the image?"},
                {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
            ]
        }
    ],
    enable_thinking=True  # Enable thinking mode
)

# Print the thinking process
if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
    print(response.output.choices[0].message.reasoning_content)

# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(response.output.choices[0].message.content[0]["text"])

# Example of passing multiple images (thinking mode enabled, uncomment to use)
# response = MultiModalConversation.call(
#     api_key=os.getenv("DASHSCOPE_API_KEY"),
#     model="kimi-k2.5",
#     messages=[
#         {
#             "role": "user",
#             "content": [
#                 {"text": "What do these images depict?"},
#                 {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
#                 {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
#             ]
#         }
#     ],
#     enable_thinking=True
# )
#
# # Print the thinking process and response
# if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
#     print("\nThinking Process:\n" + response.output.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + response.output.choices[0].message.content[0]["text"])

Java

// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

public class KimiK25MultiModalExample {
    public static void main(String[] args) {
        try {
            // Single-image input example (thinking mode enabled)
            MultiModalConversation conv = new MultiModalConversation();

            // Build the message content
            Map<String, Object> textContent = new HashMap<>();
            textContent.put("text", "What scene is depicted in the image?");

            Map<String, Object> imageContent = new HashMap<>();
            imageContent.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");

            MultiModalMessage userMessage = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(textContent, imageContent))
                    .build();

            // Build the request parameters
            MultiModalConversationParam param = MultiModalConversationParam.builder()
                    // If the environment variable is not set, replace this with your Model Studio API key
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .model("kimi-k2.5")
                    .messages(Arrays.asList(userMessage))
                    .enableThinking(true)  // Enable thinking mode
                    .build();

            // Call the model
            MultiModalConversationResult result = conv.call(param);

            // Print the result
            String content = result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text");
            System.out.println("Response content: " + content);

            // If thinking mode is enabled, print the thinking process
            if (result.getOutput().getChoices().get(0).getMessage().getReasoningContent() != null) {
                System.out.println("\nThinking process: " +
                    result.getOutput().getChoices().get(0).getMessage().getReasoningContent());
            }

            // Multi-image input example (uncomment to use)
            // Map<String, Object> imageContent1 = new HashMap<>();
            // imageContent1.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");
            // Map<String, Object> imageContent2 = new HashMap<>();
            // imageContent2.put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png");
            //
            // Map<String, Object> textContent2 = new HashMap<>();
            // textContent2.put("text", "What do these images depict?");
            //
            // MultiModalMessage multiImageMessage = MultiModalMessage.builder()
            //         .role(Role.USER.getValue())
            //         .content(Arrays.asList(textContent2, imageContent1, imageContent2))
            //         .build();
            //
            // MultiModalConversationParam multiParam = MultiModalConversationParam.builder()
            //         .apiKey(System.getenv("DASHSCOPE_API_KEY"))
            //         .model("kimi-k2.5")
            //         .messages(Arrays.asList(multiImageMessage))
            //         .enableThinking(true)
            //         .build();
            //
            // MultiModalConversationResult multiResult = conv.call(multiParam);
            // System.out.println(multiResult.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));

        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.err.println("Call failed: " + e.getMessage());
        }
    }
}

curl

curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "kimi-k2.5",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "What scene is depicted in the image?"
                    },
                    {
                        "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    }
                ]
            }
        ]
    },
    "parameters": {
        "enable_thinking": true
    }
}'

# Multi-image input example (uncomment to use)
# curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
#     "model": "kimi-k2.5",
#     "input": {
#         "messages": [
#             {
#                 "role": "user",
#                 "content": [
#                     {
#                         "text": "What do these images depict?"
#                     },
#                     {
#                         "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
#                     },
#                     {
#                         "image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
#                     }
#                 ]
#             }
#         ]
#     },
#     "parameters": {
#         "enable_thinking": true
#     }
# }'

Pemahaman video

File video

kimi-k2.5 menganalisis konten video dengan mengekstraksi frame. Kendalikan strategi ekstraksi frame menggunakan dua parameter berikut:

fps: Mengontrol frekuensi ekstraksi frame. Satu frame diekstraksi setiap $\frac{1}{f p s}$ detik. Nilai valid: 0,1 hingga 10. Nilai default: 2,0.
- Untuk adegan bergerak cepat, gunakan nilai fps lebih tinggi untuk menangkap lebih banyak detail.
- Untuk video statis atau panjang, gunakan nilai fps lebih rendah untuk meningkatkan efisiensi pemrosesan.
max_frames: Membatasi jumlah maksimum frame yang diekstraksi dari video. Nilai default dan maksimum: 2000.
Jika jumlah total frame yang dihitung berdasarkan fps melebihi batas ini, sistem secara otomatis mengekstraksi frame secara merata dalam batas max_frames. Parameter ini hanya tersedia saat menggunakan SDK DashScope.

OpenAI compatible

Saat menggunakan SDK OpenAI atau metode HTTP untuk mengirimkan file video langsung ke model, atur parameter "type" dalam pesan pengguna ke "video_url".

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                # When passing a video file directly, set the "type" value to "video_url"
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                    "fps": 2
                },
                {
                    "type": "text",
                    "text": "What is the content of this video?"
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "kimi-k2.5",
        messages: [
            {
                role: "user",
                content: [
                    // When passing a video file directly, set the "type" value to "video_url"
                    {
                        type: "video_url",
                        video_url: {
                            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                        },
                        "fps": 2
                    },
                    {
                        type: "text",
                        text: "What is the content of this video?"
                    }
                ]
            }
        ]
    });

    console.log(response.choices[0].message.content);
}

main();

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "video_url",
            "video_url": {
              "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
            },
            "fps":2
          },
          {
            "type": "text",
            "text": "What is the content of this video?"
          }
        ]
      }
    ]
  }'

DashScope

Python

import dashscope
import os

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

messages = [
    {"role": "user",
        "content": [
            # The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
            {"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "What is the content of this video?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # If the environment variable is not set, replace the following line with your Model Studio API key: api_key ="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='kimi-k2.5',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
        Map<String, Object> params = new HashMap<>();
        params.put("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4");
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "What is the content of this video?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.5")
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "kimi-k2.5",
    "input":{
        "messages":[
            {"role": "user","content": [{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "What is the content of this video?"}]}]}
}'

Daftar gambar

Saat video dikirimkan sebagai daftar gambar (frame video yang telah diekstraksi sebelumnya), gunakan parameter fps untuk memberi tahu model interval waktu antar frame. Hal ini membantu model memahami urutan kejadian, durasi, dan perubahan dinamis dengan lebih baik. Model mendukung menentukan laju ekstraksi frame video asli menggunakan parameter fps, artinya frame diekstraksi dari video asli setiap $\frac{1}{f p s}$ detik.

OpenAI compatible

Saat menggunakan SDK OpenAI atau metode HTTP untuk mengirimkan video sebagai daftar gambar, atur parameter "type" dalam pesan pengguna ke "video".

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="kimi-k2.5", 
    messages=[{"role": "user","content": [
        # When passing an image list, set the "type" parameter in the user message to "video"
         {"type": "video","video": [
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
         "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
         "fps":2},
         {"type": "text","text": "Describe the specific process of this video"},
    ]}]
)

print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "kimi-k2.5",  
        messages: [{
            role: "user",
            content: [
                {
                    // When passing an image list, set the "type" parameter in the user message to "video"
                    type: "video",
                    video: [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                        "fps":2
                },
                {
                    type: "text",
                    text: "Describe the specific process of this video"
                }
            ]
        }]
    });
    console.log(response.choices[0].message.content);
}

main();

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "kimi-k2.5",
    "messages": [{"role": "user","content": [{"type": "video","video": [
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                  "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                  "fps":2},
                {"type": "text","text": "Describe the specific process of this video"}]}]
}'

DashScope

Python

import os
import dashscope

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

messages = [{"role": "user",
             "content": [
                 {"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                   "fps":2},
                 {"text": "Describe the specific process of this video"}]}]
response = dashscope.MultiModalConversation.call(
    # If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='kimi-k2.5', 
    messages=messages
)
print(response.output.choices[0].message.content[0]["text"])

Java

// DashScope SDK version must be at least 2.21.10
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}

    private static final String MODEL_NAME = "kimi-k2.5"; 
    public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> params = new HashMap<>();
        params.put("video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"));
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "Describe the specific process of this video")))
                .build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL_NAME)
                .messages(Arrays.asList(userMessage)).build();
        MultiModalConversationResult result = conv.call(param);
        System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            videoImageListSample();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "kimi-k2.5",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "video": [
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
            ],
            "fps":2
                 
          },
          {
            "text": "Describe the specific process of this video"
          }
        ]
      }
    ]
  }
}'

Kirimkan file lokal

Contoh berikut menunjukkan cara mengirimkan file lokal. API yang kompatibel dengan OpenAI hanya mendukung encoding Base64. SDK DashScope mendukung encoding Base64 maupun jalur file.

OpenAI compatible

Untuk mengirimkan data terenkripsi Base64, buat Data URL. Untuk petunjuk, lihat Construct a Data URL.

Python

from openai import OpenAI
import os
import base64

# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")

client = OpenAI(
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"}, 
                },
                {"type": "text", "text": "What scene is depicted in the image?"},
            ],
        }
    ],
)
print(completion.choices[0].message.content)


# The following are examples for passing a local video file and an image list

# 【Local video file】Encode the local video as a Data URL and pass it in video_url:
#   def encode_video_to_data_url(video_path):
#       with open(video_path, "rb") as f:
#           return "data:video/mp4;base64," + base64.b64encode(f.read()).decode("utf-8")

#   video_data_url = encode_video_to_data_url("xxx/local.mp4")
#   content = [{"type": "video_url", "video_url": {"url": video_data_url}, "fps": 2}, {"type": "text", "text": "What is the content of this video?"}]

# 【Local image list】Encode multiple local images as Base64 and form a video list:
#   image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg", "xxx/f2.jpg", "xxx/f3.jpg", "xxx/f4.jpg"]]
#   content = [{"type": "video", "video": image_data_urls, "fps": 2}, {"type": "text", "text": "Describe the specific process of this video"}]

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
// Replace xxx/eagle.png with the absolute path of your local image
const base64Image = encodeImage("xxx/eagle.png")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "kimi-k2.5", 
        messages: [
            {"role": "user",
             "content": [{"type": "image_url",
                        "image_url": {"url": `data:image/png;base64,${base64Image}`},},
                        {"type": "text", "text": "What scene is depicted in the image?"}]}]
    });
    console.log(completion.choices[0].message.content);
}

main();

# The following are examples for passing a local video file and an image list

# 【Local video file】Encode the local video as a Data URL and pass it in video_url:
#   const encodeVideoToDataUrl = (videoPath) => "data:video/mp4;base64," + readFileSync(videoPath).toString("base64");
#   const videoDataUrl = encodeVideoToDataUrl("xxx/local.mp4");
#   content: [{ type: "video_url", video_url: { url: videoDataUrl }, fps: 2 }, { type: "text", text: "What is the content of this video?" }]

# 【Local image list】Encode multiple local images as Base64 and form a video list:
#   const imageDataUrls = ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"].map(p => `data:image/jpeg;base64,${encodeImage(p)}`);
#   content: [{ type: "video", video: imageDataUrls, fps: 2 }, { type: "text", text: "Describe the specific process of this video" }]

#   messages: [{"role": "user", "content": content}] 
#   Then call openai.chat.completions.create(model: "kimi-k2.5", messages: messages)

DashScope

Metode encoding Base64

Untuk menggunakan encoding Base64, buat Data URL. Untuk petunjuk, lihat Construct a Data URL.

Python

import base64
import os
import dashscope 
from dashscope import MultiModalConversation

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")

messages = [
    {
        "role": "user",
        "content": [
            {"image": f"data:image/png;base64,{base64_image}"},
            {"text": "What scene is depicted in the image?"},
        ],
    },
]
response = MultiModalConversation.call(
    # If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="kimi-k2.5", 
    messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])

# The following are examples for passing a local video file and an image list

# 【Local video file】
#   video_data_url = "data:video/mp4;base64," + base64.b64encode(open("xxx/local.mp4","rb").read()).decode("utf-8")
#   content: [{"video": video_data_url, "fps": 2}, {"text": "What is the content of this video?"}]

# 【Local image list】
#   Base64: image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"]]
#   content: [{"video": image_data_urls, "fps": 2}, {"text": "Describe the specific process of this video"}]

Java

import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

   static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}

    private static String encodeToBase64(String imagePath) throws IOException {
        Path path = Paths.get(imagePath);
        byte[] imageBytes = Files.readAllBytes(path);
        return Base64.getEncoder().encodeToString(imageBytes);
    }
    

    public static void callWithLocalFile(String localPath) throws ApiException, NoApiKeyException, UploadFileException, IOException {

        String base64Image = encodeToBase64(localPath); // Base64 encoding

        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        new HashMap<String, Object>() {{ put("image", "data:image/png;base64," + base64Image); }},
                        new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
                )).build();

        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.5")
                .messages(Arrays.asList(userMessage))
                .build();

        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            // Replace xxx/eagle.png with the absolute path of your local image
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
    
    // The following are examples for passing a local video file and an image list
    // 【Local video file】
    // String base64Image = encodeToBase64(localPath);
    // MultiModalConversation conv = new MultiModalConversation();
   //  MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
   //             .content(Arrays.asList(
   //                     new HashMap<String, Object>() {{ put("video", "data:video/mp4;base64," + base64Video;}},
   //                     new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
   //             )).build();

    // 【Local image list】
    // List<String> urls = Arrays.asList(
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f1.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f2.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f3.jpg),
    //                                   "data:image/jpeg;base64,"+encodeToBase64(path/f4.jpg)); 
   //  MultiModalConversation conv = new MultiModalConversation();
   //  MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
   //             .content(Arrays.asList(
   //                     new HashMap<String, Object>() {{ put("video", urls;}},
   //                     new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
   //             )).build();

}

Metode jalur file lokal

Kirimkan jalur file lokal langsung ke model. Metode ini hanya didukung oleh SDK Python dan Java DashScope, serta tidak didukung oleh HTTP DashScope atau metode yang kompatibel dengan OpenAI. Lihat tabel berikut untuk menentukan jalur file berdasarkan bahasa pemrograman dan sistem operasi Anda.

Tentukan jalur file (contoh gambar)

Sistem	SDK	Jalur file untuk diteruskan	Contoh
Linux atau macOS	Python SDK	file://{jalur_mutlak_file}	file:///home/images/test.png
Linux atau macOS	Java SDK	file://{jalur_mutlak_file}	file:///home/images/test.png
Windows	Python SDK	file://{jalur_mutlak_file}	file://D:/images/test.png
Windows	Java SDK	file:///{jalur_mutlak_file}	file:///D:/images/test.pn

Python

import os
from dashscope import MultiModalConversation
import dashscope 

dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"

# Replace xxx/eagle.png with the absolute path of your local image
local_path = "xxx/eagle.png"
image_path = f"file://{local_path}"
messages = [
                {'role':'user',
                'content': [{'image': image_path},
                            {'text': 'What scene is depicted in the image?'}]}]
response = MultiModalConversation.call(
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='kimi-k2.5',  
    messages=messages)
print(response.output.choices[0].message.content[0]["text"])

# The following are examples for passing a video and image list using local file paths
# 【Local video file】
#  video_path = "file:///path/to/local.mp4"
#  content: [{"video": video_path, "fps": 2}, {"text": "What is the content of this video?"}]

# 【Local image list】
# image_paths = ["file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg"]
# content: [{"video": image_paths, "fps": 2}, {"text": "Describe the specific process of this video"}]

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
    
    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException {
        String filePath = "file://"+localPath;
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
                        new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("kimi-k2.5")  
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}

    public static void main(String[] args) {
        try {
            // Replace xxx/eagle.png with the absolute path of your local image
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
    
    // The following are examples for passing a video and image list using local file paths
    
    // 【Local video file】
    //  String filePath = "file://"+localPath;
    //    MultiModalConversation conv = new MultiModalConversation();
    //    MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
    //            .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
    //                    new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();

    // 【Local image list】
    
    //    MultiModalConversation conv = new MultiModalConversation();
    //    List<String> filePath = Arrays.asList("file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg")
    //    MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
    //            .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
    //                    new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
}

Batasan file

Batasan gambar

Resolusi gambar:
- Ukuran minimum: Lebar dan tinggi gambar harus keduanya lebih besar dari 10 piksel.
- Rasio aspek: Rasio sisi panjang terhadap sisi pendek gambar tidak boleh melebihi 200:1.
- Batas piksel: Kami menyarankan agar Anda menjaga resolusi gambar dalam 8K (7680×4320). Gambar yang melebihi resolusi ini dapat menyebabkan panggilan API timeout karena ukuran file besar dan waktu transmisi jaringan yang lama.
Format gambar yang didukung
- Untuk resolusi di bawah 4K (3840×2160), format gambar berikut didukung:
  Format gambar
  Ekstensi file umum
  MIME Type
  BMP
  .bmp
  image/bmp
  JPEG
  .jpe, .jpeg, .jpg
  image/jpeg
  PNG
  .png
  image/png
  TIFF
  .tif, .tiff
  image/tiff
  WEBP
  .webp
  image/webp
  HEIC
  .heic
  image/heic
- Untuk resolusi antara 4K (3840×2160) dan 8K (7680×4320), hanya format JPEG, JPG, dan PNG yang didukung.
Ukuran gambar:
- Saat mengirimkan URL publik atau jalur lokal: Ukuran satu gambar tidak boleh melebihi 10 MB.
- Saat mengirimkan string terenkripsi Base64: Ukuran string terenkripsi tidak boleh melebihi 10 MB.
Untuk mengurangi ukuran file, lihat Cara mengompres gambar atau video ke ukuran yang diinginkan.
Jumlah gambar yang didukung: Saat Anda mengirimkan beberapa gambar, jumlah gambar dibatasi oleh token input maksimum model. Jumlah total token untuk semua gambar dan teks gabungan harus kurang dari batas ini.

Batasan video

Dikirimkan sebagai daftar gambar: Minimum 4 gambar, maksimum 2000 gambar.
Dikirimkan sebagai file video:
- Ukuran video:
  - Saat dikirimkan sebagai URL publik: Maksimal 2 GB.
  - Saat dikirimkan sebagai string terenkripsi Base64: Kurang dari 10 MB.
  - Saat dikirimkan sebagai jalur file lokal: Video itu sendiri tidak boleh melebihi 100 MB.
- Durasi video: 2 detik hingga 1 jam.
Format video: MP4, AVI, MKV, MOV, FLV, WMV, dll.
Resolusi video: Tidak ada batasan spesifik. Kami menyarankan agar tetap di bawah 2K. Resolusi lebih tinggi meningkatkan waktu pemrosesan tanpa meningkatkan pemahaman model.
Pemahaman audio: Tidak didukung untuk audio dalam file video.

Fitur model

Model	Multi-turn conversation	Deep thinking	Function calling	Structured output	Web search	Partial mode	Context cache
kimi-k2.5	Supported	Supported	Supported	Not supported	Not supported	Not supported	Not supported
kimi-k2-thinking	Supported	Supported	Supported	Supported	Not supported	Not supported	Not supported
Moonshot-Kimi-K2-Instruct	Supported	Not supported	Supported	Not supported	Supported	Not supported	Not supported

Nilai parameter default

Model	enable_thinking	temperature	top_p	presence_penalty	fps	max_frames
kimi-k2.5	false	Thinking mode: 1.0 Non-thinking mode: 0.6	Thinking/non-thinking mode: 0.95	Thinking/non-thinking mode: 0.0	2	2000
kimi-k2-thinking	-	1.0	-	-	-	-
Moonshot-Kimi-K2-Instruct	-	0.6	1.0	0	-	-

Tanda hubung (-) menunjukkan bahwa tidak ada nilai default dan parameter tersebut tidak dapat diatur.

Kode error

Jika panggilan model gagal dan mengembalikan pesan error, lihat Error messages untuk memecahkan masalah.

Format gambar	Ekstensi file umum	MIME Type
BMP	.bmp	image/bmp
JPEG	.jpe, .jpeg, .jpg	image/jpeg
PNG	.png	image/png
TIFF	.tif, .tiff	image/tiff
WEBP	.webp	image/webp
HEIC	.heic	image/heic