Topik ini menjelaskan cara memanggil model seri Kimi di platform Alibaba Cloud Model Studio menggunakan API yang kompatibel dengan OpenAI atau SDK DashScope.
Dokumen ini hanya berlaku untuk wilayah China (Beijing). Untuk menggunakan model, Anda harus menggunakan API key dari wilayah China (Beijing).
Pengenalan model
Model seri Kimi adalah large language models (LLMs) yang dikembangkan oleh Moonshot AI.
kimi-k2.5: Model Kimi paling cerdas hingga saat ini, mencapai performa SoTA open-source dalam Agent, kode, pemahaman visual, dan berbagai tugas kecerdasan umum. Ini juga merupakan model Kimi paling serbaguna hingga kini, dengan arsitektur multimodal native yang mendukung input teks maupun visual, mode berpikir dan non-berpikir, serta tugas dialog dan Agent.
kimi-k2-thinking: Hanya mendukung mode deep thinking. Proses penalarannya ditampilkan di bidang
reasoning_content. Model ini unggul dalam coding dan tool calling, sehingga cocok untuk skenario yang memerlukan analisis logis, perencanaan, atau pemahaman mendalam.Moonshot-Kimi-K2-Instruct: Tidak mendukung deep thinking. Menghasilkan respons secara langsung untuk performa lebih cepat, sehingga cocok untuk skenario yang memerlukan jawaban cepat dan langsung.
Model | Mode | Context window | Max input | Max CoT | Max response | Input cost | Output cost |
(tokens) | (per 1M tokens) | ||||||
kimi-k2.5 | Thinking mode | 262.144 | 258.048 | 32.768 | 32.768 | $0,574 | $3,011 |
kimi-k2.5 | Non-thinking mode | 262.144 | 260.096 | - | 32.768 | $0,574 | $3,011 |
kimi-k2-thinking | Thinking mode | 262.144 | 229.376 | 32.768 | 16.384 | $0,574 | $2,294 |
Moonshot-Kimi-K2-Instruct | Non-thinking mode | 131.072 | 131.072 | - | 8.192 | $0,574 | $2,294 |
Contoh generasi teks
Sebelum menggunakan API, Anda harus mendapatkan API key dan menyetel API key sebagai variabel lingkungan. Jika Anda melakukan panggilan menggunakan SDK, instal SDK.
OpenAI compatible
Python
Kode contoh
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2-thinking",
messages=[{"role": "user", "content": "Who are you?"}],
stream=True,
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Indicates whether the model has started generating the response
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
if chunk.choices:
delta = chunk.choices[0].delta
# Collect only the thinking content
if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
if not is_answering:
print(delta.reasoning_content, end="", flush=True)
reasoning_content += delta.reasoning_content
# When content is received, start generating the response
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
print(delta.content, end="", flush=True)
answer_content += delta.contentRespons
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.
I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences.
Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI. I am based on a Mixture-of-Experts (MoE) architecture and have capabilities such as long-context understanding, intelligent conversation, file processing, code generation, and complex task reasoning. How can I help you?Node.js
Kode contoh
import OpenAI from "openai";
import process from 'process';
// Initialize the OpenAI client
const openai = new OpenAI({
// If the environment variable is not set, replace this with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = ''; // Complete thinking process
let answerContent = ''; // Complete response
let isAnswering = false; // Indicates whether the model has started generating the response
async function main() {
const messages = [{ role: 'user', content: 'Who are you?' }];
const stream = await openai.chat.completions.create({
model: 'kimi-k2-thinking',
messages,
stream: true,
});
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (chunk.choices?.length) {
const delta = chunk.choices[0].delta;
// Collect only the thinking content
if (delta.reasoning_content !== undefined && delta.reasoning_content !== null) {
if (!isAnswering) {
process.stdout.write(delta.reasoning_content);
}
reasoningContent += delta.reasoning_content;
}
// When content is received, start generating the response
if (delta.content !== undefined && delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
}
}
main();Respons
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.
I should maintain a friendly and professional tone, avoiding overly technical jargon so that regular users can easily understand. I should also emphasize that I am an AI without personal consciousness, emotions, or experiences to avoid misunderstandings.
Response structure:
- Directly state my identity
- Mention my developer
- Briefly introduce my core capabilities
- Keep it concise and clear
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI.
I am good at:
- Long-text understanding and generation
- Intelligent conversation and Q&A
- File processing and analysis
- Information retrieval and integration
As an AI assistant, I do not have personal consciousness, emotions, or experiences, but I will do my best to provide you with accurate and helpful assistance. How can I help you?HTTP
Kode contoh
curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-thinking",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
]
}'Respons
{
"choices": [
{
"message": {
"content": "I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, file analysis, programming assistance, and complex task reasoning. I can help you answer questions, create content, and analyze documents. How can I help you?",
"reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should introduce myself clearly and concisely, including the following:\n1. My identity: AI assistant\n2. My developer: Moonshot AI\n3. My name: Kimi\n4. My core capabilities: long-text processing, intelligent conversation, file processing, search, etc.\n\nI should maintain a friendly and professional tone while providing useful information. No need to overcomplicate it; a direct answer is sufficient.",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 8,
"completion_tokens": 183,
"total_tokens": 191
},
"created": 1762753998,
"system_fingerprint": null,
"model": "kimi-k2-thinking",
"id": "chatcmpl-485ab490-90ec-48c3-85fa-1c732b683db2"
}DashScope
Python
Kode contoh
import os
from dashscope import Generation
# Initialize request parameters
messages = [{"role": "user", "content": "Who are you?"}]
completion = Generation.call(
# If the environment variable is not set, replace this with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="kimi-k2-thinking",
messages=messages,
result_format="message", # Set the result format to message
stream=True, # Enable streaming output
incremental_output=True, # Enable incremental output
)
reasoning_content = "" # Complete thinking process
answer_content = "" # Complete response
is_answering = False # Indicates whether the model has started generating the response
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
message = chunk.output.choices[0].message
# Collect only the thinking content
if message.reasoning_content:
if not is_answering:
print(message.reasoning_content, end="", flush=True)
reasoning_content += message.reasoning_content
# When content is received, start generating the response
if message.content:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
print(message.content, end="", flush=True)
answer_content += message.content
# After the loop, the reasoning_content and answer_content variables contain the complete content.
# You can perform further processing here as needed.
# print(f"\n\nComplete thinking process:\n{reasoning_content}")
# print(f"\nComplete response:\n{answer_content}")Respons
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.
Key information to include the following:
1. My name: Kimi
2. My developer: Moonshot AI
3. My nature: Artificial intelligence assistant
4. What I can do: Answer questions, assist with creation, etc.
I should maintain a friendly and helpful tone while accurately stating my identity. I should not pretend to be human or have a personal identity.
A suitable response could be:
"I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?"
This response is direct, accurate, and invites further interaction.
====================Complete Response====================
I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you with various tasks such as answering questions, creating content, and analyzing documents. How can I help you?Java
Kode contoh
// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(GenerationResult message) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (reasoning!= null&&!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (content!= null&&!content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
private static GenerationParam buildGenerationParam(Message userMsg) {
return GenerationParam.builder()
// If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2-thinking")
.incrementalOutput(true)
.resultFormat("message")
.messages(Arrays.asList(userMsg))
.build();
}
public static void streamCallWithMessage(Generation gen, Message userMsg)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(userMsg);
Flowable<GenerationResult> result = gen.streamCall(param);
result.blockingForEach(message -> handleGenerationResult(message));
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you?").build();
streamCallWithMessage(gen, userMsg);
// Print the final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}Respons
====================Thinking Process====================
The user asks "Who are you?", which is a direct question about my identity. I need to answer truthfully based on my actual identity.
I am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.
The response should include the following:
1. My identity: AI assistant
2. My developer: Moonshot AI
3. My name: Kimi
4. My core capabilities: long-text processing, intelligent conversation, file processing, etc.
I should not pretend to be human or provide excessive technical details. A clear and friendly answer is sufficient.
====================Complete Response====================
I am an AI assistant named Kimi, developed by Moonshot AI. I am skilled at long-text processing, intelligent conversation, answering questions, assisting with creation, and helping you analyze and process files. How can I help you?HTTP
Kode contoh
curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-thinking",
"input":{
"messages":[
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters": {
"result_format": "message"
}
}'Respons
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"content": "I am Kimi, an artificial intelligence assistant developed by Moonshot AI. I can help you answer questions, create content, analyze documents, and write code. How can I help you?",
"reasoning_content": "The user asks \"Who are you?\", which is a direct question about my identity. I need to answer truthfully based on my actual identity.\n\nI am an AI assistant named Kimi, developed by Moonshot AI. I should state this clearly and concisely.\n\nKey information to include the following:\n1. My name: Kimi\n2. My developer: Moonshot AI\n3. My nature: Artificial intelligence assistant\n4. What I can do: Answer questions, assist with creation, etc.\n\nI should respond in a friendly and direct manner that is easy for the user to understand.",
"role": "assistant"
}
}
]
},
"usage": {
"input_tokens": 9,
"output_tokens": 156,
"total_tokens": 165
},
"request_id": "709a0697-ed1f-4298-82c9-a4b878da1849"
}kimi-k2.5 contoh multimodal
kimi-k2.5 dapat memproses input teks, gambar, atau video secara bersamaan.
Aktifkan atau nonaktifkan mode berpikir
kimi-k2.5 adalah model berpikir hibrida yang dapat merespons baik setelah proses berpikir maupun secara langsung. Kendalikan perilaku ini menggunakan parameter enable_thinking:
truefalse(default)
Contoh berikut menunjukkan cara menggunakan URL gambar dan mengaktifkan mode berpikir. Contoh utama menggunakan satu gambar, sedangkan kode yang dikomentari menunjukkan input multi-gambar.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
# Example of passing a single image (thinking mode enabled)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What scene is depicted in the image?"},
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
}
]
}
],
extra_body={"enable_thinking":True} # Enable thinking mode
)
# Print the thinking process
if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
print(completion.choices[0].message.reasoning_content)
# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(completion.choices[0].message.content)
# Example of passing multiple images (thinking mode enabled, uncomment to use)
# completion = client.chat.completions.create(
# model="kimi-k2.5",
# messages=[
# {
# "role": "user",
# "content": [
# {"type": "text", "text": "What do these images depict?"},
# {
# "type": "image_url",
# "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
# },
# {
# "type": "image_url",
# "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
# }
# ]
# }
# ],
# extra_body={"enable_thinking":True}
# )
#
# # Print the thinking process and response
# if hasattr(completion.choices[0].message, 'reasoning_content') and completion.choices[0].message.reasoning_content:
# print("\nThinking Process:\n" + completion.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import process from 'process';
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
// Example of passing a single image (thinking mode enabled)
const completion = await openai.chat.completions.create({
model: 'kimi-k2.5',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What scene is depicted in the image?' },
{
type: 'image_url',
image_url: {
url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg'
}
}
]
}
],
enable_thinking: true // Enable thinking mode
});
// Print the thinking process
if (completion.choices[0].message.reasoning_content) {
console.log('\n' + '='.repeat(20) + 'Thinking Process' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.reasoning_content);
}
// Print the response content
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
console.log(completion.choices[0].message.content);
// Example of passing multiple images (thinking mode enabled, uncomment to use)
// const multiCompletion = await openai.chat.completions.create({
// model: 'kimi-k2.5',
// messages: [
// {
// role: 'user',
// content: [
// { type: 'text', text: 'What do these images depict?' },
// {
// type: 'image_url',
// image_url: { url: 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg' }
// },
// {
// type: 'image_url',
// image_url: { url: 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png' }
// }
// ]
// }
// ],
// enable_thinking: true
// });
//
// // Print the thinking process and response
// if (multiCompletion.choices[0].message.reasoning_content) {
// console.log('\nThinking Process:\n' + multiCompletion.choices[0].message.reasoning_content);
// }
// console.log('\nComplete Response:\n' + multiCompletion.choices[0].message.content);curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What scene is depicted in the image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
}
]
}
],
"enable_thinking": true
}'
# Multi-image input example (uncomment to use)
# curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
# "model": "kimi-k2.5",
# "messages": [
# {
# "role": "user",
# "content": [
# {
# "type": "text",
# "text": "What do these images depict?"
# },
# {
# "type": "image_url",
# "image_url": {
# "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
# }
# },
# {
# "type": "image_url",
# "image_url": {
# "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
# }
# }
# ]
# }
# ],
# "enable_thinking": true,
# "stream": false
# }'DashScope
Python
import os
from dashscope import MultiModalConversation
# Example of passing a single image (thinking mode enabled)
response = MultiModalConversation.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{"text": "What scene is depicted in the image?"},
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}
]
}
],
enable_thinking=True # Enable thinking mode
)
# Print the thinking process
if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
print(response.output.choices[0].message.reasoning_content)
# Print the response content
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
print(response.output.choices[0].message.content[0]["text"])
# Example of passing multiple images (thinking mode enabled, uncomment to use)
# response = MultiModalConversation.call(
# api_key=os.getenv("DASHSCOPE_API_KEY"),
# model="kimi-k2.5",
# messages=[
# {
# "role": "user",
# "content": [
# {"text": "What do these images depict?"},
# {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
# {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}
# ]
# }
# ],
# enable_thinking=True
# )
#
# # Print the thinking process and response
# if hasattr(response.output.choices[0].message, 'reasoning_content') and response.output.choices[0].message.reasoning_content:
# print("\nThinking Process:\n" + response.output.choices[0].message.reasoning_content)
# print("\nComplete Response:\n" + response.output.choices[0].message.content[0]["text"])Java
// DashScope SDK version >= 2.19.4
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
public class KimiK25MultiModalExample {
public static void main(String[] args) {
try {
// Single-image input example (thinking mode enabled)
MultiModalConversation conv = new MultiModalConversation();
// Build the message content
Map<String, Object> textContent = new HashMap<>();
textContent.put("text", "What scene is depicted in the image?");
Map<String, Object> imageContent = new HashMap<>();
imageContent.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(textContent, imageContent))
.build();
// Build the request parameters
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If the environment variable is not set, replace this with your Model Studio API key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.enableThinking(true) // Enable thinking mode
.build();
// Call the model
MultiModalConversationResult result = conv.call(param);
// Print the result
String content = result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text");
System.out.println("Response content: " + content);
// If thinking mode is enabled, print the thinking process
if (result.getOutput().getChoices().get(0).getMessage().getReasoningContent() != null) {
System.out.println("\nThinking process: " +
result.getOutput().getChoices().get(0).getMessage().getReasoningContent());
}
// Multi-image input example (uncomment to use)
// Map<String, Object> imageContent1 = new HashMap<>();
// imageContent1.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");
// Map<String, Object> imageContent2 = new HashMap<>();
// imageContent2.put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png");
//
// Map<String, Object> textContent2 = new HashMap<>();
// textContent2.put("text", "What do these images depict?");
//
// MultiModalMessage multiImageMessage = MultiModalMessage.builder()
// .role(Role.USER.getValue())
// .content(Arrays.asList(textContent2, imageContent1, imageContent2))
// .build();
//
// MultiModalConversationParam multiParam = MultiModalConversationParam.builder()
// .apiKey(System.getenv("DASHSCOPE_API_KEY"))
// .model("kimi-k2.5")
// .messages(Arrays.asList(multiImageMessage))
// .enableThinking(true)
// .build();
//
// MultiModalConversationResult multiResult = conv.call(multiParam);
// System.out.println(multiResult.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.err.println("Call failed: " + e.getMessage());
}
}
}curl
curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"text": "What scene is depicted in the image?"
},
{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
]
}
]
},
"parameters": {
"enable_thinking": true
}
}'
# Multi-image input example (uncomment to use)
# curl -X POST "https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
# -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
# "model": "kimi-k2.5",
# "input": {
# "messages": [
# {
# "role": "user",
# "content": [
# {
# "text": "What do these images depict?"
# },
# {
# "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
# },
# {
# "image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
# }
# ]
# }
# ]
# },
# "parameters": {
# "enable_thinking": true
# }
# }'Pemahaman video
File video
kimi-k2.5 menganalisis konten video dengan mengekstraksi frame. Kendalikan strategi ekstraksi frame menggunakan dua parameter berikut:
fps: Mengontrol frekuensi ekstraksi frame. Satu frame diekstraksi setiap
detik. Nilai valid: 0,1 hingga 10. Nilai default: 2,0. Untuk adegan bergerak cepat, gunakan nilai fps lebih tinggi untuk menangkap lebih banyak detail.
Untuk video statis atau panjang, gunakan nilai fps lebih rendah untuk meningkatkan efisiensi pemrosesan.
max_frames: Membatasi jumlah maksimum frame yang diekstraksi dari video. Nilai default dan maksimum: 2000.
Jika jumlah total frame yang dihitung berdasarkan fps melebihi batas ini, sistem secara otomatis mengekstraksi frame secara merata dalam batas max_frames. Parameter ini hanya tersedia saat menggunakan SDK DashScope.
OpenAI compatible
Saat menggunakan SDK OpenAI atau metode HTTP untuk mengirimkan file video langsung ke model, atur parameter"type"dalam pesan pengguna ke"video_url".
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
# When passing a video file directly, set the "type" value to "video_url"
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
"fps": 2
},
{
"type": "text",
"text": "What is the content of this video?"
}
]
}
]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "kimi-k2.5",
messages: [
{
role: "user",
content: [
// When passing a video file directly, set the "type" value to "video_url"
{
type: "video_url",
video_url: {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
"fps": 2
},
{
type: "text",
text: "What is the content of this video?"
}
]
}
]
});
console.log(response.choices[0].message.content);
}
main();curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
"fps":2
},
{
"type": "text",
"text": "What is the content of this video?"
}
]
}
]
}'DashScope
Python
import dashscope
import os
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
messages = [
{"role": "user",
"content": [
# The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
{"text": "What is the content of this video?"}
]
}
]
response = dashscope.MultiModalConversation.call(
# If the environment variable is not set, replace the following line with your Model Studio API key: api_key ="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='kimi-k2.5',
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
// The fps parameter controls the frame extraction frequency, indicating one frame is extracted every 1/fps seconds
Map<String, Object> params = new HashMap<>();
params.put("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4");
params.put("fps", 2);
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
params,
Collections.singletonMap("text", "What is the content of this video?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"input":{
"messages":[
{"role": "user","content": [{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
{"text": "What is the content of this video?"}]}]}
}'Daftar gambar
Saat video dikirimkan sebagai daftar gambar (frame video yang telah diekstraksi sebelumnya), gunakan parameter fps untuk memberi tahu model interval waktu antar frame. Hal ini membantu model memahami urutan kejadian, durasi, dan perubahan dinamis dengan lebih baik. Model mendukung menentukan laju ekstraksi frame video asli menggunakan parameter fps, artinya frame diekstraksi dari video asli setiap
OpenAI compatible
Saat menggunakan SDK OpenAI atau metode HTTP untuk mengirimkan video sebagai daftar gambar, atur parameter"type"dalam pesan pengguna ke"video".
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user","content": [
# When passing an image list, set the "type" parameter in the user message to "video"
{"type": "video","video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2},
{"type": "text","text": "Describe the specific process of this video"},
]}]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "kimi-k2.5",
messages: [{
role: "user",
content: [
{
// When passing an image list, set the "type" parameter in the user message to "video"
type: "video",
video: [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2
},
{
type: "text",
text: "Describe the specific process of this video"
}
]
}]
});
console.log(response.choices[0].message.content);
}
main();curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"messages": [{"role": "user","content": [{"type": "video","video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2},
{"type": "text","text": "Describe the specific process of this video"}]}]
}'DashScope
Python
import os
import dashscope
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
messages = [{"role": "user",
"content": [
{"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2},
{"text": "Describe the specific process of this video"}]}]
response = dashscope.MultiModalConversation.call(
# If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='kimi-k2.5',
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Java
// DashScope SDK version must be at least 2.21.10
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
private static final String MODEL_NAME = "kimi-k2.5";
public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> params = new HashMap<>();
params.put("video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"));
params.put("fps", 2);
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
params,
Collections.singletonMap("text", "Describe the specific process of this video")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL_NAME)
.messages(Arrays.asList(userMessage)).build();
MultiModalConversationResult result = conv.call(param);
System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
videoImageListSample();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "kimi-k2.5",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
],
"fps":2
},
{
"text": "Describe the specific process of this video"
}
]
}
]
}
}'Kirimkan file lokal
Contoh berikut menunjukkan cara mengirimkan file lokal. API yang kompatibel dengan OpenAI hanya mendukung encoding Base64. SDK DashScope mendukung encoding Base64 maupun jalur file.
OpenAI compatible
Untuk mengirimkan data terenkripsi Base64, buat Data URL. Untuk petunjuk, lihat Construct a Data URL.
Python
from openai import OpenAI
import os
import base64
# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")
client = OpenAI(
api_key=os.getenv('DASHSCOPE_API_KEY'),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
{"type": "text", "text": "What scene is depicted in the image?"},
],
}
],
)
print(completion.choices[0].message.content)
# The following are examples for passing a local video file and an image list
# 【Local video file】Encode the local video as a Data URL and pass it in video_url:
# def encode_video_to_data_url(video_path):
# with open(video_path, "rb") as f:
# return "data:video/mp4;base64," + base64.b64encode(f.read()).decode("utf-8")
# video_data_url = encode_video_to_data_url("xxx/local.mp4")
# content = [{"type": "video_url", "video_url": {"url": video_data_url}, "fps": 2}, {"type": "text", "text": "What is the content of this video?"}]
# 【Local image list】Encode multiple local images as Base64 and form a video list:
# image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg", "xxx/f2.jpg", "xxx/f3.jpg", "xxx/f4.jpg"]]
# content = [{"type": "video", "video": image_data_urls, "fps": 2}, {"type": "text", "text": "Describe the specific process of this video"}]
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
// Replace xxx/eagle.png with the absolute path of your local image
const base64Image = encodeImage("xxx/eagle.png")
async function main() {
const completion = await openai.chat.completions.create({
model: "kimi-k2.5",
messages: [
{"role": "user",
"content": [{"type": "image_url",
"image_url": {"url": `data:image/png;base64,${base64Image}`},},
{"type": "text", "text": "What scene is depicted in the image?"}]}]
});
console.log(completion.choices[0].message.content);
}
main();
# The following are examples for passing a local video file and an image list
# 【Local video file】Encode the local video as a Data URL and pass it in video_url:
# const encodeVideoToDataUrl = (videoPath) => "data:video/mp4;base64," + readFileSync(videoPath).toString("base64");
# const videoDataUrl = encodeVideoToDataUrl("xxx/local.mp4");
# content: [{ type: "video_url", video_url: { url: videoDataUrl }, fps: 2 }, { type: "text", text: "What is the content of this video?" }]
# 【Local image list】Encode multiple local images as Base64 and form a video list:
# const imageDataUrls = ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"].map(p => `data:image/jpeg;base64,${encodeImage(p)}`);
# content: [{ type: "video", video: imageDataUrls, fps: 2 }, { type: "text", text: "Describe the specific process of this video" }]
# messages: [{"role": "user", "content": content}]
# Then call openai.chat.completions.create(model: "kimi-k2.5", messages: messages)DashScope
Metode encoding Base64
Untuk menggunakan encoding Base64, buat Data URL. Untuk petunjuk, lihat Construct a Data URL.
Python
import base64
import os
import dashscope
from dashscope import MultiModalConversation
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
# Encoding function: Converts a local file to a Base64 encoded string
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")
messages = [
{
"role": "user",
"content": [
{"image": f"data:image/png;base64,{base64_image}"},
{"text": "What scene is depicted in the image?"},
],
},
]
response = MultiModalConversation.call(
# If the environment variable is not set, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="kimi-k2.5",
messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])
# The following are examples for passing a local video file and an image list
# 【Local video file】
# video_data_url = "data:video/mp4;base64," + base64.b64encode(open("xxx/local.mp4","rb").read()).decode("utf-8")
# content: [{"video": video_data_url, "fps": 2}, {"text": "What is the content of this video?"}]
# 【Local image list】
# Base64: image_data_urls = [f"data:image/jpeg;base64,{encode_image(p)}" for p in ["xxx/f1.jpg","xxx/f2.jpg","xxx/f3.jpg","xxx/f4.jpg"]]
# content: [{"video": image_data_urls, "fps": 2}, {"text": "Describe the specific process of this video"}]Java
import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
private static String encodeToBase64(String imagePath) throws IOException {
Path path = Paths.get(imagePath);
byte[] imageBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(imageBytes);
}
public static void callWithLocalFile(String localPath) throws ApiException, NoApiKeyException, UploadFileException, IOException {
String base64Image = encodeToBase64(localPath); // Base64 encoding
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
new HashMap<String, Object>() {{ put("image", "data:image/png;base64," + base64Image); }},
new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
// Replace xxx/eagle.png with the absolute path of your local image
callWithLocalFile("xxx/eagle.png");
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
// The following are examples for passing a local video file and an image list
// 【Local video file】
// String base64Image = encodeToBase64(localPath);
// MultiModalConversation conv = new MultiModalConversation();
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(
// new HashMap<String, Object>() {{ put("video", "data:video/mp4;base64," + base64Video;}},
// new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
// )).build();
// 【Local image list】
// List<String> urls = Arrays.asList(
// "data:image/jpeg;base64,"+encodeToBase64(path/f1.jpg),
// "data:image/jpeg;base64,"+encodeToBase64(path/f2.jpg),
// "data:image/jpeg;base64,"+encodeToBase64(path/f3.jpg),
// "data:image/jpeg;base64,"+encodeToBase64(path/f4.jpg));
// MultiModalConversation conv = new MultiModalConversation();
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(
// new HashMap<String, Object>() {{ put("video", urls;}},
// new HashMap<String, Object>() {{ put("text", "What scene is depicted in the image?"); }}
// )).build();
}Metode jalur file lokal
Kirimkan jalur file lokal langsung ke model. Metode ini hanya didukung oleh SDK Python dan Java DashScope, serta tidak didukung oleh HTTP DashScope atau metode yang kompatibel dengan OpenAI. Lihat tabel berikut untuk menentukan jalur file berdasarkan bahasa pemrograman dan sistem operasi Anda.
Python
import os
from dashscope import MultiModalConversation
import dashscope
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
# Replace xxx/eagle.png with the absolute path of your local image
local_path = "xxx/eagle.png"
image_path = f"file://{local_path}"
messages = [
{'role':'user',
'content': [{'image': image_path},
{'text': 'What scene is depicted in the image?'}]}]
response = MultiModalConversation.call(
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='kimi-k2.5',
messages=messages)
print(response.output.choices[0].message.content[0]["text"])
# The following are examples for passing a video and image list using local file paths
# 【Local video file】
# video_path = "file:///path/to/local.mp4"
# content: [{"video": video_path, "fps": 2}, {"text": "What is the content of this video?"}]
# 【Local image list】
# image_paths = ["file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg"]
# content: [{"video": image_paths, "fps": 2}, {"text": "Describe the specific process of this video"}]
Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope.aliyuncs.com/api/v1";}
public static void callWithLocalFile(String localPath)
throws ApiException, NoApiKeyException, UploadFileException {
String filePath = "file://"+localPath;
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("kimi-k2.5")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}
public static void main(String[] args) {
try {
// Replace xxx/eagle.png with the absolute path of your local image
callWithLocalFile("xxx/eagle.png");
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
// The following are examples for passing a video and image list using local file paths
// 【Local video file】
// String filePath = "file://"+localPath;
// MultiModalConversation conv = new MultiModalConversation();
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
// new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
// 【Local image list】
// MultiModalConversation conv = new MultiModalConversation();
// List<String> filePath = Arrays.asList("file:///path/f1.jpg", "file:///path/f2.jpg", "file:///path/f3.jpg", "file:///path/f4.jpg")
// MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// .content(Arrays.asList(new HashMap<String, Object>(){{put("video", filePath);}},
// new HashMap<String, Object>(){{put("text", "What scene is depicted in the image?");}})).build();
}Batasan file
Batasan gambar
Resolusi gambar:
Ukuran minimum: Lebar dan tinggi gambar harus keduanya lebih besar dari
10piksel.Rasio aspek: Rasio sisi panjang terhadap sisi pendek gambar tidak boleh melebihi
200:1.Batas piksel: Kami menyarankan agar Anda menjaga resolusi gambar dalam
8K (7680×4320). Gambar yang melebihi resolusi ini dapat menyebabkan panggilan API timeout karena ukuran file besar dan waktu transmisi jaringan yang lama.
Format gambar yang didukung
Untuk resolusi di bawah 4K
(3840×2160), format gambar berikut didukung:Format gambar
Ekstensi file umum
MIME Type
BMP
.bmp
image/bmp
JPEG
.jpe, .jpeg, .jpg
image/jpeg
PNG
.png
image/png
TIFF
.tif, .tiff
image/tiff
WEBP
.webp
image/webp
HEIC
.heic
image/heic
Untuk resolusi antara
4K (3840×2160)dan8K (7680×4320), hanya format JPEG, JPG, dan PNG yang didukung.
Ukuran gambar:
Saat mengirimkan URL publik atau jalur lokal: Ukuran satu gambar tidak boleh melebihi
10 MB.Saat mengirimkan string terenkripsi Base64: Ukuran string terenkripsi tidak boleh melebihi
10 MB.
Untuk mengurangi ukuran file, lihat Cara mengompres gambar atau video ke ukuran yang diinginkan.
Jumlah gambar yang didukung: Saat Anda mengirimkan beberapa gambar, jumlah gambar dibatasi oleh token input maksimum model. Jumlah total token untuk semua gambar dan teks gabungan harus kurang dari batas ini.
Batasan video
Dikirimkan sebagai daftar gambar: Minimum 4 gambar, maksimum 2000 gambar.
Dikirimkan sebagai file video:
Ukuran video:
Saat dikirimkan sebagai URL publik: Maksimal 2 GB.
Saat dikirimkan sebagai string terenkripsi Base64: Kurang dari 10 MB.
Saat dikirimkan sebagai jalur file lokal: Video itu sendiri tidak boleh melebihi 100 MB.
Durasi video: 2 detik hingga 1 jam.
Format video: MP4, AVI, MKV, MOV, FLV, WMV, dll.
Resolusi video: Tidak ada batasan spesifik. Kami menyarankan agar tetap di bawah 2K. Resolusi lebih tinggi meningkatkan waktu pemrosesan tanpa meningkatkan pemahaman model.
Pemahaman audio: Tidak didukung untuk audio dalam file video.
Fitur model
Model | |||||||
kimi-k2.5 | |||||||
kimi-k2-thinking | |||||||
Moonshot-Kimi-K2-Instruct |
Nilai parameter default
Model | enable_thinking | temperature | top_p | presence_penalty | fps | max_frames |
kimi-k2.5 | false | Thinking mode: 1.0 Non-thinking mode: 0.6 | Thinking/non-thinking mode: 0.95 | Thinking/non-thinking mode: 0.0 | 2 | 2000 |
kimi-k2-thinking | - | 1.0 | - | - | - | - |
Moonshot-Kimi-K2-Instruct | - | 0.6 | 1.0 | 0 | - | - |
Tanda hubung (-) menunjukkan bahwa tidak ada nilai default dan parameter tersebut tidak dapat diatur.
Kode error
Jika panggilan model gagal dan mengembalikan pesan error, lihat Error messages untuk memecahkan masalah.