This topic describes how to call Kimi series models on the Alibaba Cloud Model Studio platform using either the OpenAI-compatible API or the DashScope SDK.
This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.
Model introduction
The Kimi series models are large language models (LLMs) developed by Moonshot AI.
kimi-k2.5: Kimi's most intelligent model to date, achieving open-source SoTA performance in Agent, code, visual understanding, and a range of general intelligent tasks. It is also Kimi's most versatile model to date, featuring a native multimodal architecture that supports both visual and text input, thinking and non-thinking modes, and dialogue and Agent tasks.
kimi-k2-thinking: Supports only deep thinking mode. It displays the reasoning process in the
reasoning_contentfield. This model excels at coding and tool calling. It is suitable for scenarios that require logical analysis, planning, or deep understanding.Moonshot-Kimi-K2-Instruct: Does not support deep thinking. It generates responses directly for faster performance. This model is suitable for scenarios that require quick and direct answers.
Model | Mode | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
kimi-k2.5 | Thinking | 262,144 | 258,048 | 32,768 | 32,768 | $0.574 | $3.011 |
kimi-k2.5 | Non-thinking | 262,144 | 260,096 | - | 32,768 | $0.574 | $3.011 |
kimi-k2-thinking | Thinking | 262,144 | 229,376 | 32,768 | 16,384 | $0.574 | $2.294 |
Moonshot-Kimi-K2-Instruct | Non-thinking | 131,072 | 131,072 | - | 8,192 | $0.574 | $2.294 |
The above models are not third-party services. They are all deployed on Model Studio servers.
Text generation example
Before you use the API, you must get an API key and set the API key as environment variable. If you make calls using an SDK, install the SDK.
OpenAI compatible
Python
Sample code
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2-thinking",
messages=[{"role": "user", "content": "Who are you?"}],
stream=True,
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Indicates whether the model has started generating the response
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
if chunk.choices:
delta = chunk.choices[0].delta
# Collect only the thinking content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# When content is received, start generating the response
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.contentResponse
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.
I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences.
Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI. I am based on a Mixture-of-Experts (MoE) architecture and have capabilities such as long-context understanding, intelligent conversation, file processing, code generation, and complex task reasoning. How can I help you?Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
// If the environment variable is not set, replace this with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // Complete thinking process
let answerContent = ''; // Complete response
let isAnswering = false; // Indicates whether the model has started generating the response
async function main() {
const messages = [{ role: 'user', content: 'Who are you?' }];
const stream = await openai.chat.completions.create({
model: 'kimi-k2-thinking',
messages,
stream: true,
});
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (chunk.choices?.length) {
const delta = chunk.choices[0].delta;
// Collect only the thinking content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// When content is received, start generating the response
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
}
}
main();Response
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.
I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can easily understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences to avoid misunderstandings.
Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI.
I am good at:
- Long-text understanding and generation
- Intelligent conversation and Q&A
- File processing and analysis
- Information retrieval and integration
As an AI assistant, I do not have personal consciousness, emotions, or experiences, but I will do my best to provide you with accurate and helpful assistance. How can I help you?HTTP
Sample code
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-thinking",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
]
}'Response
{
"choices": [
{
"message": {
"content": "I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, file analysis, programming assistance, and complex task reasoning. I can help you answer questions, create content, and analyze documents. How can I help you?",
"reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:\n1. My identity: AI assistant\n2. My developer: Moonshot AI\n3. My name: Kimi\n4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.\n\nI should maintain a friendly and professional tone while providing useful information. No need to overcomplicate it; a direct answer is sufficient.",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 8,
"completion_tokens": 183,
"total_tokens": 191
},
"created": 1762753998,
"system_fingerprint": null,
"model": "kimi-k2-thinking",
"id": "chatcmpl-485ab490-90ec-48c3-85fa-1c732b683db2"
}DashScope
Python
Sample code
import os
from dashscope import Generation
# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not set, replace this with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="kimi-k2-thinking",
messages=messages,
result_format="message", # Set the result format to message
stream=True, # Enable streaming output
incremental_output=True, # Enable incremental output
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Indicates whether the model has started generating the response
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Collect only the thinking content
if message.reasoning_content:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# When content is received, start generating the response
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
# After the loop, the reasoning_content and answer_content variables contain the complete content.
# You can perform further processing here as needed.
# print(f"\n\nComplete thinking process:\n{reasoning_content}")
# print(f"\nComplete response:\n{answer_content}")Response
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.
Key information to include the following:
1. My name: Kimi
2. My developer: Moonshot AI
3. My nature: Artificial intelligence assistant
4. What I can do: Answer questions, assist with creation, etc.
I should maintain a friendly and helpful tone while accurately stating my identity. I should not pretend to be human or have a personal identity.
A suitable response could be:
"I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?"
This response is direct, accurate, and invites further interaction.
====================Complete Response====================
I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?Java
Sample code
// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (reasoning!= null&&!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (content!= null&&!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2-thinking")
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Response
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.
The response should include the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, etc.
I should not pretend to be human or provide excessive technical details. A clear and friendly answer is sufficient.
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, answering questions, assisting with creation, and helping you analyze and process files. How can I help you?HTTP
Sample code
curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-thinking",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters": {
"result_format": "message"
}
}'Response
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"content": "I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you answer questions, create content, analyze documents, and write code. How can I help you?",
"reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.\n\nKey information to include the following:\n1. My name: Kimi\n2. My developer: Moonshot AI\n3. My nature: Artificial intelligence assistant\n4. What I can do: Answer questions, assist with creation, etc.\n\nI should respond in a friendly and direct manner that is easy for the user to understand.",
"role": "assistant"
}
}
]
},
"usage": {
"input_tokens": 9,
"output_tokens": 156,
"total_tokens": 165
},
"request_id": "709a0697-ed1f-4298-82c9-a4b878da1849"
}kimi-k2.5 multimodal example
kimi-k2.5 can process text, images, or video inputs simultaneously.
Enable or disable thinking mode
kimi-k2.5 is a hybrid thinking model that can respond either after a thinking process or directly. Control this behavior using the enable_thinking parameter:
truefalse(default)
The following examples show how to use an image URL and enable thinking mode. The main example uses a single image, while the commented-out code demonstrates multi-image input.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
# Example of passing a single image (thinking mode enabled)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What scene is depicted in the image?"},
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
}
]
}
],
extra_body={"enable_thinking":True} # Enable thinking mode
)
# Print the thinking process
if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
print(completion.choices[0].message.reasoning_content)
# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(completion.choices[0].message.content)
# Example of passing multiple images (thinking mode enabled, uncomment to use)
# completion = client.chat.completions.create(
# model="kimi-k2.5",
# messages=[
# {
# "role": "user",
# "content": [
# {"type": "text", "text": "What do these images depict?"},
# {
# "type": "image_url",
# "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
# },
# {
# "type": "image_url",
# "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
# }
# ]
# }
# ],
# extra_body={"enable_thinking":True}
# )
#
# # Print the thinking process and response
# if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
# print("\nThinking Process:\n" + completion.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import process from 'process';
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
// Example of passing a single image (thinking mode enabled)
const completion = await openai.chat.completions.create({
model: 'kimi-k2.5',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What scene is depicted in the image?' },
{
type: 'image_url',
image_url: {
url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg'
}
}
]
}
],
enable_thinking: true // Enable thinking mode
});
// Print the thinking process
if (completion.choices[0].message.reasoning_content) {
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.reasoning_content);
}
// Print the response content
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.content);
// Example of passing multiple images (thinking mode enabled, uncomment to use)
// const multiCompletion = await openai.chat.completions.create({
// model: 'kimi-k2.5',
// messages: [
// {
// role: 'user',
// content: [
// { type: 'text', text: 'What do these images depict?' },
// {
// type: 'image_url',
// image_url: { url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg' }
// },
// {
// type: 'image_url',
// image_url: { url: 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png' }
// }
// ]
// }
// ],
// enable_thinking: true
// });
//
// // Print the thinking process and response
// if (multiCompletion.choices[0].message.reasoning_content) {
// console.log('\nThinking Process:\n' + multiCompletion.choices[0].message.reasoning_content);
// }
// console.log('\nComplete Response:\n' + multiCompletion.choices[0].message.content);curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What scene is depicted in the image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
}
]
}
],
"enable_thinking": true
}'
# Multi-image input example (uncomment to use)
# curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
# "model": "kimi-k2.5",
# "messages": [
# {
# "role": "user",
# "content": [
# {
# "type": "text",
# "text": "What do these images depict?"
# },
# {
# "type": "image_url",
# "image_url": {
# "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
# }
# },
# {
# "type": "image_url",
# "image_url": {
# "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
# }
# }
# ]
# }
# ],
# "enable_thinking": true,
# "stream": false
# }'DashScope
Python
import os
from dashscope import MultiModalConversation
# Example of passing a single image (thinking mode enabled)
response = MultiModalConversation.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{"text": "What scene is depicted in the image?"},
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
]
}
],
enable_thinking=True # Enable thinking mode
)
# Print the thinking process
if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
print(response.output.choices[0].message.reasoning_content)
# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(response.output.choices[0].message.content[0]["text"])
# Example of passing multiple images (thinking mode enabled, uncomment to use)
# response = MultiModalConversation.call(
# api_key=os.getenv("DASHSCOPE_API_KEY"),
# model="kimi-k2.5",
# messages=[
# {
# "role": "user",
# "content": [
# {"text": "What do these images depict?"},
# {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
# {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
# ]
# }
# ],
# enable_thinking=True
# )
#
# # Print the thinking process and response
# if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
# print("\nThinking Process:\n" + response.output.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + response.output.choices[0].message.content[0]["text"])Java
// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
public class KimiK25MultiModalExample {
public static void main(String[] args) {
try {
// Single-image input example (thinking mode enabled)
MultiModalConversation conv = new MultiModalConversation();
// Build the message content
Map<String, Object> textContent = new HashMap<>();
textContent.put("text", "What scene is depicted in the image?");
Map<String, Object> imageContent = new HashMap<>();
imageContent.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(textContent, imageContent))
.build();
// Build the request parameters
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If the environment variable is not set, replace this with your Model Studio API key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.enableThinking(true) // Enable thinking mode
.build();
// Call the model
MultiModalConversationResult result = conv.call(param);
// Print the result
String content = result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text");
System.out.println("Response content: " + content);
// If thinking mode is enabled, print the thinking process
if (result.getOutput().getChoices().get(0).getMessage().getReasoningContent() != null) {
System.out.println("\nThinking process: " +
result.getOutput().getChoices().get(0).getMessage().getReasoningContent());
}
// Multi-image input example (uncomment to use)
// Map<String, Object> imageContent1 = new HashMap<>();
// imageContent1.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");
// Map<String, Object> imageContent2 = new HashMap<>();
// imageContent2.put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png");
//
// Map<String, Object> textContent2 = new HashMap<>();
// textContent2.put("text", "What do these images depict?");
//
// MultiModalMessage multiImageMessage = MultiModalMessage.builder()
// .role(Role.USER.getValue())
// .content(Arrays.asList(textContent2, imageContent1, imageContent2))
// .build();
//
// MultiModalConversationParam multiParam = MultiModalConversationParam.builder()
// .apiKey(System.getenv("DASHSCOPE_API_KEY"))
// .model("kimi-k2.5")
// .messages(Arrays.asList(multiImageMessage))
// .enableThinking(true)
// .build();
//
// MultiModalConversationResult multiResult = conv.call(multiParam);
// System.out.println(multiResult.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.err.println("Call failed: " + e.getMessage());
}
}
}curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"text": "What scene is depicted in the image?"
},
{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
]
}
]
},
"parameters": {
"enable_thinking": true
}
}'
# Multi-image input example (uncomment to use)
# curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
# "model": "kimi-k2.5",
# "input": {
# "messages": [
# {
# "role": "user",
# "content": [
# {
# "text": "What do these images depict?"
# },
# {
# "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
# },
# {
# "image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
# }
# ]
# }
# ]
# },
# "parameters": {
# "enable_thinking": true
# }
# }'Video understanding
Video file
kimi-k2.5 analyzes video content by extracting frames. Control the frame extraction strategy using the following two parameters:
fps: Controls the frame extraction frequency. One frame is extracted every
seconds. Valid values: 0.1 to 10. Default value: 2.0. For fast-moving scenes, use a higher fps value to capture more detail.
For static or long videos, use a lower fps value to improve processing efficiency.
max_frames: Limits the maximum number of frames extracted from a video. Default value and maximum value: 2000.
If the total number of frames calculated by fps exceeds this limit, the system automatically extracts frames uniformly within the max_frames limit. This parameter is available only when using the DashScope SDK.
OpenAI compatible
When using the OpenAI SDK or HTTP method to pass a video file directly to the model, set the"type"parameter in the user message to"video_url".
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
# When passing a video file directly, set the "type" value to "video_url"
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
"fps": 2
},
{
"type": "text",
"text": "What is the content of this video?"
}
]
}
]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "kimi-k2.5",
messages: [
{
role: "user",
content: [
// When passing a video file directly, set the "type" value to "video_url"
{
type: "video_url",
video_url: {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
"fps": 2
},
{
type: "text",
text: "What is the content of this video?"
}
]
}
]
});
console.log(response.choices[0].message.content);
}
main();curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
"fps":2
},
{
"type": "text",
"text": "What is the content of this video?"
}
]
}
]
}'DashScope
Python
import dashscope
import os
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
messages = [
{"role": "user",
"content": [
# The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
{"text": "What is the content of this video?"}
]
}
]
response = dashscope.MultiModalConversation.call(
# If the environment variable is not set, replace the following line with your Model Studio API key: api_key ="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='kimi-k2.5',
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
// The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
Map<String, Object> params = new HashMap<>();
params.put("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4");
params.put("fps", 2);
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
params,
Collections.singletonMap("text", "What is the content of this video?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"input":{
"messages":[
{"role": "user","content": [{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
{"text": "What is the content of this video?"}]}]}
}'Image list
When a video is passed as an image list (pre-extracted video frames), use the fps parameter to inform the model of the time interval between frames. This helps the model better understand event order, duration, and dynamic changes. The model supports specifying the original video's frame extraction rate using the fps parameter, meaning frames are extracted from the original video every
OpenAI compatible
When using the OpenAI SDK or HTTP method to pass a video as an image list, set the"type"parameter in the user message to"video".
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user","content": [
# When passing an image list, set the "type" parameter in the user message to "video"
{"type": "video","video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2},
{"type": "text","text": "Describe the specific process of this video"},
]}]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "kimi-k2.5",
messages: [{
role: "user",
content: [
{
// When passing an image list, set the "type" parameter in the user message to "video"
type: "video",
video: [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2
},
{
type: "text",
text: "Describe the specific process of this video"
}
]
}]
});
console.log(response.choices[0].message.content);
}
main();curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"messages": [{"role": "user","content": [{"type": "video","video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2},
{"type": "text","text": "Describe the specific process of this video"}]}]
}'DashScope
Python
import os
import dashscope
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
messages = [{"role": "user",
"content": [
{"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2},
{"text": "Describe the specific process of this video"}]}]
response = dashscope.MultiModalConversation.call(
# If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='kimi-k2.5',
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Java
// DashScope SDK version must be at least 2.21.10
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
private static final String MODEL_NAME = "kimi-k2.5";
public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> params = new HashMap<>();
params.put("video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"));
params.put("fps", 2);
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
params,
Collections.singletonMap("text", "Describe the specific process of this video")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL_NAME)
.messages(Arrays.asList(userMessage)).build();
MultiModalConversationResult result = conv.call(param);
System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
videoImageListSample();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
],
"fps":2
},
{
"text": "Describe the specific process of this video"
}
]
}
]
}
}'Pass a local file
The following examples show how to pass a local file. The OpenAI-compatible API supports only Base64 encoding. The DashScope SDK supports both Base64 encoding and file paths.
OpenAI compatible
To pass Base64-encoded data, construct a Data URL. For instructions, see Construct a Data URL.
Python
from openai import OpenAI
import os
import base64
# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")
client = OpenAI(
api_key=os.getenv('DASHSCOPE_API_KEY'),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
{"type": "text", "text": "What scene is depicted in the image?"},
],
}
],
)
print(completion.choices[0].message.content)
# The following are examples for passing a local video file and an image list
# [Local video file] Encode the local video as a Data URL and pass it in video_url:
# def encode_video_to_data_url(video_path):
# with open(video_path, "rb") as f:
# return "data:video/mp4;base64," + base64.b64encode(f.read()).decode("utf-8")
# video_data_url = encode_video_to_data_url("xxx/local.mp4")
# content = [{"type": "video_url", "video_url": {"url": video_data_url}, "fps": 2}, {"type": "text", "text": "What is the content of this video?"}]
# [Local image list] Encode multiple local images as Base64 and form a video list:
# image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg", "xxx/f2.jpg", "xxx/f3.jpg", "xxx/f4.jpg"]]
# content = [{"type": "video", "video": image_data_urls, "fps": 2}, {"type": "text", "text": "Describe the specific process of this video"}]
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
// Replace xxx/eagle.png with the absolute path of your local image
const base64Image = encodeImage("xxx/eagle.png")
async function main() {
const completion = await openai.chat.completions.create({
model: "kimi-k2.5",
messages: [
{"role": "user",
"content": [{"type": "image_url",
"image_url": {"url": `data:image/png;base64,${base64Image}`},},
{"type": "text", "text": "What scene is depicted in the image?"}]}]
});
console.log(completion.choices[0].message.content);
}
main();
# The following are examples for passing a local video file and an image list
# [Local video file] Encode the local video as a Data URL and pass it in video_url:
# const encodeVideoToDataUrl = (videoPath) => "data:video/mp4;base64," + readFileSync(videoPath).toString("base64");
# const videoDataUrl = encodeVideoToDataUrl("xxx/local.mp4");
# content: [{ type: "video_url", video_url: { url: videoDataUrl }, fps: 2 }, { type: "text", text: "What is the content of this video?" }]
# [Local image list] Encode multiple local images as Base64 and form a video list:
# const imageDataUrls = ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"].map(p => `data:image/jpeg;base64,${encodeImage(p)}`);
# content: [{ type: "video", video: imageDataUrls, fps: 2 }, { type: "text", text: "Describe the specific process of this video" }]
# messages: [{"role": "user", "content": content}]
# Then call openai.chat.completions.create(model: "kimi-k2.5", messages: messages)DashScope
Base64 encoding method
To use Base64 encoding, construct a Data URL. For instructions, see Construct a Data URL.
Python
import base64
import os
import dashscope
from dashscope import MultiModalConversation
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")
messages = [
{
"role": "user",
"content": [
{"image": f"data:image/png;base64,{base64_image}"},
{"text": "What scene is depicted in the image?"},
],
},
]
response = MultiModalConversation.call(
# If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="kimi-k2.5",
messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])
# The following are examples for passing a local video file and an image list
# [Local video file]
# video_data_url = "data:video/mp4;base64," + base64.b64encode(open("xxx/local.mp4","rb").read()).decode("utf-8")
# content: [{"video": video_data_url, "fps": 2}, {"text": "What is the content of this video?"}]
# [Local image list]
# Base64: image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"]]
# content: [{"video": image_data_urls, "fps": 2}, {"text": "Describe the specific process of this video"}]Java
import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
private static String encodeToBase64(String imagePath) throws IOException {
Path path = Paths.get(imagePath);
byte[] imageBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(imageBytes);
}
public static void callWithLocalFile(String localPath) throws ApiException, NoApiKeyException, UploadFileException, IOException {
String base64Image = encodeToBase64(localPath); // Base64 encoding
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
new HashMap<String, Object>() {{ put("image", "data:image/png;base64," + base64Image); }},
new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
// Replace xxx/eagle.png with the absolute path of your local image
callWithLocalFile("xxx/eagle.png");
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
// The following are examples for passing a local video file and an image list
// [Local video file]
// String base64Image = encodeToBase64(localPath);
// MultiModalConversation conv = new MultiModalConversation();
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(
// new HashMap<String, Object>() {{ put("video", "data:video/mp4;base64," + base64Video;}},
// new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
// )).build();
// [Local image list]
// List<String> urls = Arrays.asList(
// "data:image/jpeg;base64,"+encodeToBase64(path/f1.jpg),
// "data:image/jpeg;base64,"+encodeToBase64(path/f2.jpg),
// "data:image/jpeg;base64,"+encodeToBase64(path/f3.jpg),
// "data:image/jpeg;base64,"+encodeToBase64(path/f4.jpg));
// MultiModalConversation conv = new MultiModalConversation();
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(
// new HashMap<String, Object>() {{ put("video", urls;}},
// new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
// )).build();
}Local file path method
Pass the local file path directly to the model. This method is supported only by the DashScope Python and Java SDKs. It is not supported by DashScope HTTP or the OpenAI-compatible method. Refer to the following table to specify the file path based on your programming language and operating system.
Python
import os
from dashscope import MultiModalConversation
import dashscope
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
# Replace xxx/eagle.png with the absolute path of your local image
local_path = "xxx/eagle.png"
image_path = f"file://{local_path}"
messages = [
{'role':'user',
'content': [{'image': image_path},
{'text': 'What scene is depicted in the image?'}]}]
response = MultiModalConversation.call(
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='kimi-k2.5',
messages=messages)
print(response.output.choices[0].message.content[0]["text"])
# The following are examples for passing a video and image list using local file paths
# [Local video file]
# video_path = "file:///path/to/local.mp4"
# content: [{"video": video_path, "fps": 2}, {"text": "What is the content of this video?"}]
# [Local image list]
# image_paths = ["file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg"]
# content: [{"video": image_paths, "fps": 2}, {"text": "Describe the specific process of this video"}]
Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
public static void callWithLocalFile(String localPath)
throws ApiException, NoApiKeyException, UploadFileException {
String filePath = "file://"+localPath;
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}
public static void main(String[] args) {
try {
// Replace xxx/eagle.png with the absolute path of your local image
callWithLocalFile("xxx/eagle.png");
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
// The following are examples for passing a video and image list using local file paths
// [Local video file]
// String filePath = "file://"+localPath;
// MultiModalConversation conv = new MultiModalConversation();
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
// new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
// [Local image list]
// MultiModalConversation conv = new MultiModalConversation();
// List<String> filePath = Arrays.asList("file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg")
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
// new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
}File limitations
Image limitations
Image resolution:
Minimum size: The width and height of the image must both be greater than
10pixels.Aspect ratio: The ratio of the long side to the short side of the image must not exceed
200:1.Pixel limit: We recommend that you keep the image resolution within
8K (7680×4320). Images that exceed this resolution may cause API calls to time out because of large file sizes and long network transmission times.
Supported image formats
For resolutions below 4K
(3840×2160), the following image formats are supported:Image format
Common file extensions
MIME Type
BMP
.bmp
image/bmp
JPEG
.jpe, .jpeg, .jpg
image/jpeg
PNG
.png
image/png
TIFF
.tif, .tiff
image/tiff
WEBP
.webp
image/webp
HEIC
.heic
image/heic
For resolutions between
4K (3840×2160)and8K (7680×4320), only JPEG, JPG, and PNG formats are supported.
Image size:
When passing a public URL or local path: The size of a single image cannot exceed
10 MB.When passing a Base64-encoded string: The size of the encoded string cannot exceed
10 MB.
To reduce the file size, see How to compress an image or video to the required size.
Number of supported images: When you pass multiple images, the number of images is limited by the model's maximum input tokens. The total number of tokens for all images and text combined must be less than this limit.
Video limitations
Passed as an image list: Minimum of 4 images, maximum of 2000 images.
Passed as a video file:
Video size:
When passed as a public URL: Up to 2 GB.
When passed as a Base64-encoded string: Less than 10 MB.
When passed as a local file path: The video itself must not exceed 100 MB.
Video duration: 2 seconds to 1 hour.
Video format: MP4, AVI, MKV, MOV, FLV, WMV, etc.
Video resolution: No specific limit. We recommend keeping it under 2K. Higher resolutions increase processing time without improving model understanding.
Audio understanding: Not supported for audio in video files.
Model features
Model | |||||||
kimi-k2.5 | |||||||
kimi-k2-thinking | |||||||
Moonshot-Kimi-K2-Instruct |
Default parameter values
Model | enable_thinking | temperature | top_p | presence_penalty | fps | max_frames |
kimi-k2.5 | false | Thinking mode: 1.0 Non-thinking mode: 0.6 | Thinking/non-thinking mode: 0.95 | Thinking/non-thinking mode: 0.0 | 2 | 2000 |
kimi-k2-thinking | - | 1.0 | - | - | - | - |
Moonshot-Kimi-K2-Instruct | - | 0.6 | 1.0 | 0 | - | - |
A hyphen (-) indicates that there is no default value and the parameter cannot be set.
Error codes
If a model call fails and returns an error message, see Error messages to troubleshoot the issue.