The QVQ model has powerful visual reasoning capability. It first outputs the thinking process, then the response content. Currently, QVQ only supports streaming output.
Supported models
QVQ is a visual reasoning model that supports visual input and chain-of-thought output. It shows stronger capabilities in mathematics, programming, visual analysis, creation, and general tasks.
Name | Version | Context window | Maximum input | Maximum CoT | Maximum response | Input price | Output price | Free quota |
(Tokens) | (Million tokens) | |||||||
qvq-max Same performance as qvq-max-2025-03-25 | Stable | 131,072 | 106,496 Up to 16,384 per image | 16,384 | 8,192 | Time-limited free trial After the free quota runs out, you cannot access this model. Please stay tuned for updates. | 1 million tokens each Valid for 180 days after activation | |
qvq-max-latest Always same performance as the latest snapshot | Latest | |||||||
qvq-max-2025-03-25 Also qvq-max-0325 | Snapshot |
For concurrent rate limiting, see Rate limits.
Get started
Prerequisites: You must have obtained an API key and configured it as an environment variable. To use the SDKs, you must install OpenAI or DashScope SDK. The DashScope SDK for Java must be version 2.19.0 or later.
Due to the long reasoning time, currently QVQ only supports streaming output.
Thinking cannot be disabled for QVQ.
QVQ does not support System Message.
For the DashScope method;
incremental_output
defaults totrue
and cannot befalse
.result_format
defaults to"message"
, and cannot be"text"
.
The following sample codes use image URL for understanding.
Check the limitations on input images section. To use local images, see Using local files.
OpenAI
Python
Sample code
from openai import OpenAI
import os
# Initialize OpenAI client
client = OpenAI(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
reasoning_content = "" # Define complete thinking process
answer_content = "" # Define complete response
is_answering = False # Determine if thinking process has ended and response has begun
# Create chat completion request
completion = client.chat.completions.create(
model="qvq-max", # Using qvq-max as an example, can be replaced with other model names as needed
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
},
},
{"type": "text", "text": "How to solve this problem?"},
],
},
],
stream=True,
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
// Initialize openai client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [
{
role: "user",
content: [
{ type: "image_url", image_url: { "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg" } },
{ type: "text", text: "Solve this problem" },
]
}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
HTTP
Sample code
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qvq-max",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
}
},
{
"type": "text",
"text": "Solve this problem"
}
]
}
],
"stream":true,
"stream_options":{"include_usage":true}
}'
DashScope
Python
Sample code
import os
import dashscope
from dashscope import MultiModalConversation
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
{"text": "Solve this problem."}
]
}
]
response = MultiModalConversation.call(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qvq-max", # Using qvq-max as an example, can be replaced with other model names as needed
messages=messages,
stream=True,
)
# Define complete thinking process
reasoning_content = ""
# Define complete response
answer_content = ""
# Determine if thinking process has ended and response has begun
is_answering = False
print("=" * 20 + "Reasoning Process" + "=" * 20)
for chunk in response:
# If both thinking process and response are empty, ignore
message = chunk.output.choices[0].message
reasoning_content_chunk = message.get("reasoning_content", None)
if (chunk.output.choices[0].message.content == [] and
reasoning_content_chunk == ""):
pass
else:
# If current is thinking process
if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
print(chunk.output.choices[0].message.reasoning_content, end="")
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If current is response
elif chunk.output.choices[0].message.content != []:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content[0]["text"], end="")
answer_content += chunk.output.choices[0].message.content[0]["text"]
# If you need to print the complete thinking process and complete response, uncomment the following code
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Java
Sample code
// dashscope SDK version >= 2.19.0
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(MultiModalConversationResult message) {
String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String reasoning = Objects.isNull(re)?"":re; // Default value
List> content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (Objects.nonNull(content) && !content.isEmpty()) {
Object text = content.get(0).get("text");
finalContent.append(content.get(0).get("text"));
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(text);
}
}
public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg) {
return MultiModalConversationParam.builder()
// If environment variable is not configured, replace with your Model Studio API Key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// Using qvq-max as an example, can be replaced with other model names as needed
.model("qvq-max")
.messages(Arrays.asList(Msg))
.incrementalOutput(true)
.build();
}
public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
Flowable result = conv.streamCall(param);
result.blockingForEach(message -> {
handleGenerationResult(message);
});
}
public static void main(String[] args) {
try {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMsg = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("image", "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"),
Collections.singletonMap("text", "Solve this problem")))
.build();
streamCallWithMessage(conv, userMsg);
// Print final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}
HTTP
Sample code
curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qvq-max",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
{"text": "Please solve this problem"}
]
}
]
}
}'
Multi-round conversation
By default, the QVQ API does not store your conversation history. The multi-round conversation feature equips the model with the ability to "remember" past interactions, catering to scenarios such as follow-up questions and information gathering. You will receive reasoning_content
and content
from QVQ. You just need to include content
in the context by using {'role': 'assistant', 'content': concatenated streaming output content}
. reasoning_content
is not required.
OpenAI
Implement multi-round conversation through OpenAI SDK or OpenAI-compatible HTTP method.
Python
Sample code
from openai import OpenAI
import os
# Initialize OpenAI client
client = OpenAI(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
},
},
{"type": "text", "text": "Solve this problem"},
],
}
]
conversation_idx = 1
while True:
reasoning_content = "" # Define complete thinking process
answer_content = "" # Define complete response
is_answering = False # Determine if thinking process has ended and response has begun
print("="*20+f"Round {conversation_idx}"+"="*20)
conversation_idx += 1
# Create chat completion request
completion = client.chat.completions.create(
model="qvq-max", # Using qvq-max as an example, can be replaced with other model names as needed
messages=messages,
stream=True
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
messages.append({"role": "assistant", "content": answer_content})
messages.append({
"role": "user",
"content": [
{
"type": "text",
"text": input("\nEnter your message: ")
}
]
})
print("\n")
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
import readline from 'readline/promises';
// Initialize readline interface
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
// Initialize openai client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
},
},
{"type": "text", "text": "Solve this problem"},
],
}
];
let conversationIdx = 1;
async function main() {
while (true) {
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
console.log("=".repeat(20) + `Round ${conversationIdx}` + "=".repeat(20));
conversationIdx++;
// Reset state
reasoningContent = '';
answerContent = '';
isAnswering = false;
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log("\n" + "=".repeat(20) + "Reasoning Process" + "=".repeat(20) + "\n");
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
if (delta.content) {
if (!isAnswering) {
console.log('\n' + "=".repeat(20) + "Complete Response" + "=".repeat(20) + "\n");
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
// Add complete response to message history
messages.push({ role: 'assistant', content: answerContent });
const userInput = await rl.question("Enter your message: ");
messages.push({"role": "user", "content":userInput});
console.log("\n");
} catch (error) {
console.error('Error:', error);
}
}
}
// Start program
main().catch(console.error);
HTTP
Sample code
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qvq-max",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
}
},
{
"type": "text",
"text": "Solve this problem"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Rectangular prism: surface area is 52, volume is 24. Cube: surface area is 54, volume is 27."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the formula for the area of a triangle?"
}
]
}
],
"stream":true,
"stream_options":{"include_usage":true}
}'
DashScope
Implement multi-round conversation through DashScope SDK or HTTP method.
Python
Sample code
import os
import dashscope
from dashscope import MultiModalConversation
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
{"text": "Solve this problem"}
]
}
]
conversation_idx = 1
while True:
print("=" * 20 + f"Round {conversation_idx}" + "=" * 20)
conversation_idx += 1
response = MultiModalConversation.call(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qvq-max", # Using qvq-max as an example, can be replaced with other model names as needed
messages=messages,
stream=True,
)
# Define complete thinking process
reasoning_content = ""
# Define complete response
answer_content = ""
# Determine if thinking process has ended and response has begun
is_answering = False
print("=" * 20 + "Reasoning Process" + "=" * 20)
for chunk in response:
# If both thinking process and response are empty, ignore
message = chunk.output.choices[0].message
reasoning_content_chunk = message.get("reasoning_content", None)
if (chunk.output.choices[0].message.content == [] and
reasoning_content_chunk == ""):
pass
else:
# If current is thinking process
if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
print(chunk.output.choices[0].message.reasoning_content, end="")
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If current is response
elif chunk.output.choices[0].message.content != []:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content[0]["text"], end="")
answer_content += chunk.output.choices[0].message.content[0]["text"]
messages.append({"role": "assistant", "content": answer_content})
messages.append({
"role": "user",
"content": [
{
"type": "text",
"text": input("\nEnter your message: ")
}
]
})
print("\n")
# If you need to print the complete thinking process and complete response, uncomment the following code
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Java
Sample code
// dashscope SDK version >= 2.19.0
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(MultiModalConversationResult message) {
String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String reasoning = Objects.isNull(re)?"":re; // Default value
List> content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (Objects.nonNull(content) && !content.isEmpty()) {
Object text = content.get(0).get("text");
finalContent.append(content.get(0).get("text"));
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(text);
}
}
public static MultiModalConversationParam buildMultiModalConversationParam(List Msg) {
return MultiModalConversationParam.builder()
// If environment variable is not configured, replace with your Model Studio API Key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// Using qvq-max as an example, can be replaced with other model names as needed
.model("qvq-max")
.messages(Msg)
.incrementalOutput(true)
.build();
}
public static void streamCallWithMessage(MultiModalConversation conv, List Msg)
throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
Flowable result = conv.streamCall(param);
result.blockingForEach(message -> {
handleGenerationResult(message);
});
}
public static void main(String[] args) {
try {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMsg1 = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("image", "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"),
Collections.singletonMap("text", "Solve this problem")))
.build();
MultiModalMessage AssistantMsg = MultiModalMessage.builder()
.role(Role.ASSISTANT.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "Rectangular prism: surface area is 52, volume is 24. Cube: surface area is 54, volume is 27.")))
.build();
MultiModalMessage userMsg2 = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "What is the formula for the area of a triangle?")))
.build();
List Msg = Arrays.asList(userMsg1,AssistantMsg,userMsg2);
streamCallWithMessage(conv, Msg);
// Print final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}
HTTP
Sample code
curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qvq-max",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
{"text": "Solve this problem"}
]
},
{
"role": "assistant",
"content": [
{"text": "Rectangular prism: surface area is 52, volume is 24. Cube: surface area is 54, volume is 27."}
]
},
{
"role": "user",
"content": [
{"text": "What is the formula for the area of a triangle?"}
]
}
]
}
}'
Multiple image input
QVQ can process multiple images in a single request, and the model will respond based on all of them. You can input images as URLs or local files, or a combination of both. The sample codes use URLs.
The total number of tokens in the input images must be less than the maximum input of the model. Calculate the maximum number of images based on Image number limits.
OpenAI
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
reasoning_content = "" # Define complete thinking process
answer_content = "" # Define complete response
is_answering = False # Determine if thinking process has ended and response has begun
completion = client.chat.completions.create(
model="qvq-max",
messages=[
{"role": "user", "content": [
# First image link, if passing a local file, replace the url value with the Base64 encoded format of the image
{"type": "image_url", "image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"}, },
# Second image link, if passing a local file, replace the url value with the Base64 encoded format of the image
{"type": "image_url",
"image_url": {"url": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg"}, },
{"type": "text", "text": "Answer the question in the first image, then interpret the article in the second image."},
],
}
],
stream=True,
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
import OpenAI from "openai";
import process from 'process';
// Initialize openai client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [
{role: "user",content: [
// First image link, if passing a local file, replace the url value with the Base64 encoded format of the image
{type: "image_url",image_url: {"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"}},
// Second image link, if passing a local file, replace the url value with the Base64 encoded format of the image
{type: "image_url",image_url: {"url": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg"}},
{type: "text", text: "Answer the question in the first image, then interpret the article in the second image." },
]}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qvq-max",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg"
}
},
{
"type": "text",
"text": "Answer the question in the first image, then interpret the article in the second image."
}
]
}
],
"stream":true,
"stream_options":{"include_usage":true}
}'
DashScope
Python
import os
import dashscope
from dashscope import MultiModalConversation
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
# First image link, if passing a local file, replace the url value with the Base64 encoded format of the image
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
# Second image link, if passing a local file, replace the url value with the Base64 encoded format of the image
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg"},
{"text": "Answer the question in the first image, then interpret the article in the second image."}
]
}
]
response = MultiModalConversation.call(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qvq-max", # Using qvq-max as an example, can be replaced with other model names as needed
messages=messages,
stream=True,
)
# Define complete thinking process
reasoning_content = ""
# Define complete response
answer_content = ""
# Determine if thinking process has ended and response has begun
is_answering = False
print("=" * 20 + "Reasoning Process" + "=" * 20)
for chunk in response:
# If both thinking process and response are empty, ignore
message = chunk.output.choices[0].message
reasoning_content_chunk = message.get("reasoning_content", None)
if (chunk.output.choices[0].message.content == [] and
reasoning_content_chunk == ""):
pass
else:
# If current is thinking process
if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
print(chunk.output.choices[0].message.reasoning_content, end="")
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If current is response
elif chunk.output.choices[0].message.content != []:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content[0]["text"], end="")
answer_content += chunk.output.choices[0].message.content[0]["text"]
# If you need to print the complete thinking process and complete response, uncomment the following code
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Java
// dashscope SDK version >= 2.19.0
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(MultiModalConversationResult message) {
String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String reasoning = Objects.isNull(re)?"":re; // Default value
List> content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (Objects.nonNull(content) && !content.isEmpty()) {
Object text = content.get(0).get("text");
finalContent.append(text);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(text);
}
}
public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg) {
return MultiModalConversationParam.builder()
// If environment variable is not configured, replace with your Model Studio API Key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// Using qvq-max as an example, can be replaced with other model names as needed
.model("qvq-max")
.messages(Arrays.asList(Msg))
.incrementalOutput(true)
.build();
}
public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
Flowable result = conv.streamCall(param);
result.blockingForEach(message -> {
handleGenerationResult(message);
});
}
public static void main(String[] args) {
try {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
// First image link
Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
// If using a local image, uncomment the line below
// new HashMap(){{put("image", filePath);}},
// Second image link
Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"),
// Third image link
Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/hbygyo/rabbit.jpg"),
Collections.singletonMap("text", "What do these images depict?"))).build();
streamCallWithMessage(conv, userMessage);
// Print final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}
curl
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
--data '{
"model": "qvq-max",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg"},
{"text": "Answer the question in the first image, then interpret the article in the second image."}
]
}
]
}
}'
Video understanding
Input videos as image lists or video files.
Image list
At least 4 images and at most 512 images.
Sample code for image sequence URL. To pass local video, see Using local files (Base64 encoded).
OpenAI
Python
import os
from openai import OpenAI
client = OpenAI(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
reasoning_content = "" # Define complete thinking process
answer_content = "" # Define complete response
is_answering = False # Determine if thinking process has ended and response has begun
completion = client.chat.completions.create(
model="qvq-max",
messages=[{"role": "user","content": [
# When passing an image list, the "type" parameter in the user message is "video"
# When using the OpenAI SDK, the image sequence is by default extracted from the video at intervals of 0.5 seconds and does not support modification. If you need to customize the frame extraction frequency, please use the DashScope SDK.
{"type": "video","video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
{"type": "text","text": "Describe the specific process of this video"},
]}],
stream=True,
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
import OpenAI from "openai";
import process from 'process';
// Initialize openai client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [{
role: "user",
content: [
{
// When passing an image list, the "type" parameter in the user message is "video"
// When using the OpenAI SDK, the image sequence is by default extracted from the video at intervals of 0.5 seconds and does not support modification. If you need to customize the frame extraction frequency, please use the DashScope SDK.
type: "video",
video: [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
type: "text",
text: "Describe the specific process of this video"
}
]
}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
HTTP
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qvq-max",
"messages": [{"role": "user",
"content": [{"type": "video",
"video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
{"type": "text",
"text": "Describe the specific process of this video"}]}],
"stream":true,
"stream_options":{"include_usage":true}
}'
DashScope
Python
import os
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{"role": "user",
"content": [
{"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2}, # When using qvq, you can specify the fps parameter. It indicates that the image sequence is extracted from the video at intervals of 1/fps seconds.
{"text": "Describe the specific process of this video"}]}]
response = dashscope.MultiModalConversation.call(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qvq-max',
messages=messages,
stream=True
)
# Define complete thinking process
reasoning_content = ""
# Define complete response
answer_content = ""
# Determine if thinking process has ended and response has begun
is_answering = False
print("=" * 20 + "Reasoning Process" + "=" * 20)
for chunk in response:
# If both thinking process and response are empty, ignore
message = chunk.output.choices[0].message
reasoning_content_chunk = message.get("reasoning_content", None)
if (chunk.output.choices[0].message.content == [] and
reasoning_content_chunk == ""):
pass
else:
# If current is thinking process
if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
print(chunk.output.choices[0].message.reasoning_content, end="")
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If current is response
elif chunk.output.choices[0].message.content != []:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content[0]["text"], end="")
answer_content += chunk.output.choices[0].message.content[0]["text"]
# If you need to print the complete thinking process and complete response, uncomment the following code
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Java
// dashscope SDK version >= 2.19.0
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(MultiModalConversationResult message) {
String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String reasoning = Objects.isNull(re)?"":re; // Default value
List> content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (Objects.nonNull(content) && !content.isEmpty()) {
Object text = content.get(0).get("text");
finalContent.append(text);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(text);
}
}
public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg) {
return MultiModalConversationParam.builder()
// If environment variable is not configured, replace with your Model Studio API Key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// Using qvq-max as an example, can be replaced with other model names as needed
.model("qvq-max")
.messages(Arrays.asList(Msg))
.incrementalOutput(true)
.build();
}
public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
Flowable result = conv.streamCall(param);
result.blockingForEach(message -> {
handleGenerationResult(message);
});
}
public static void main(String[] args) {
try {
MultiModalConversation conv = new MultiModalConversation();
Map params = Map.of(
"video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg")
);
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
params,
Collections.singletonMap("text", "Describe the specific process of this video")))
.build();
streamCallWithMessage(conv, userMessage);
// Print final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}}
HTTP
curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qvq-max",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
"text": "Describe the specific process of this video"
}
]
}
]
}
}'
Video file
Size:
For video URL: Up to 1 GB.
For local file: When using OpenAI SDK, the Base64-encoded video must be less than 10 MB, see Using local files (Base64 encoded).
Format: MP4, AVI, MKV, MOV, FLV, WMV.
Duration: From 2 seconds to 10 minutes.
Dimensions: No restrictions. However, video files will be adjusted to approximately 600,000 pixels. Larger video dimensions will not provide better understanding effects.
Currently, audio understanding of video files is not supported.
The following sample codes are for video URL. To pass local video, see Using local files (Base64 encoded).
OpenAI
Python
import os
from openai import OpenAI
client = OpenAI(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
reasoning_content = "" # Define complete thinking process
answer_content = "" # Define complete response
is_answering = False # Determine if thinking process has ended and response has begun
completion = client.chat.completions.create(
model="qvq-max",
messages=[{"role": "user","content": [
# When passing an image list, the "type" parameter in the user message is "video"
# When using the OpenAI SDK, the image sequence is by default extracted from the video at intervals of 0.5 seconds and does not support modification. If you need to customize the frame extraction frequency, please use the DashScope SDK.
{"type": "video_url","video_url":{"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250328/eepdcq/phase_change_480p.mov"} },
{"type": "text","text": "This is the beginning part of the video. Please analyze and guess what knowledge the video is explaining."},
]}],
stream=True,
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
import OpenAI from "openai";
import process from 'process';
// Initialize openai client
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variable
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [
{
role: "user",
content: [
// When using the OpenAI SDK, the image sequence is by default extracted from the video at intervals of 0.5 seconds and does not support modification. If you need to customize the frame extraction frequency, please use the DashScope SDK.
{ type: "video_url", video_url: { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250328/eepdcq/phase_change_480p.mov" } },
{ type: "text", text: "This is the beginning part of the video. Please analyze and guess what knowledge the video is explaining." },
]
}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
HTTP
curl
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qvq-max",
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250328/eepdcq/phase_change_480p.mov"
}
},
{
"type": "text",
"text": "This is the beginning part of the video. Please analyze and guess what knowledge the video is explaining."
}
]
}
],
"stream":true,
"stream_options":{"include_usage":true}
}'
DashScope
Python
import os
import dashscope
from dashscope import MultiModalConversation
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
# You can specify the fps parameter. It indicates that the image sequence is extracted from the video at intervals of 1/fps seconds. For instructions, see https://www.alibabacloud.com/help/en/model-studio/use-qwen-by-calling-api?#2ed5ee7377fum
{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250328/eepdcq/phase_change_480p.mov,"fps":2"},
{"text": "This is the beginning part of the video. Please analyze and guess what knowledge the video is explaining."}
]
}
]
response = MultiModalConversation.call(
# If environment variable is not configured, replace with your Model Studio API Key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qvq-max", # Using qvq-max as an example, can be replaced with other model names as needed
messages=messages,
stream=True,
)
# Define complete thinking process
reasoning_content = ""
# Define complete response
answer_content = ""
# Determine if thinking process has ended and response has begun
is_answering = False
print("=" * 20 + "Reasoning Process" + "=" * 20)
for chunk in response:
# If both thinking process and response are empty, ignore
message = chunk.output.choices[0].message
reasoning_content_chunk = message.get("reasoning_content", None)
if (chunk.output.choices[0].message.content == [] and
reasoning_content_chunk == ""):
pass
else:
# If current is thinking process
if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
print(chunk.output.choices[0].message.reasoning_content, end="")
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If current is response
elif chunk.output.choices[0].message.content != []:
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content[0]["text"], end="")
answer_content += chunk.output.choices[0].message.content[0]["text"]
# If you need to print the complete thinking process and complete response, uncomment the following code
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Java
// dashscope SDK version >= 2.19.0
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(MultiModalConversationResult message) {
String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String reasoning = Objects.isNull(re)?"":re; // Default value
List> content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Reasoning Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (Objects.nonNull(content) && !content.isEmpty()) {
Object text = content.get(0).get("text");
finalContent.append(content.get(0).get("text"));
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(text);
}
}
public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg) {
return MultiModalConversationParam.builder()
// If environment variable is not configured, replace with your Model Studio API Key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// Using qvq-max as an example, can be replaced with other model names as needed
.model("qvq-max")
.messages(Arrays.asList(Msg))
.incrementalOutput(true)
.build();
}
public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
Flowable result = conv.streamCall(param);
result.blockingForEach(message -> {
handleGenerationResult(message);
});
}
public static void main(String[] args) {
try {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMsg = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250328/eepdcq/phase_change_480p.mov"),
Collections.singletonMap("text", "This is the beginning part of the video. Please analyze and guess what knowledge the video is explaining.")))
.build();
streamCallWithMessage(conv, userMsg);
// Print final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Complete Response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}}
HTTP
curl
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qvq-max",
"input":{
"messages":[
{
"role": "user",
"content": [
{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250328/eepdcq/phase_change_480p.mov"},
{"text": "This is the beginning part of the video. Please analyze and guess what knowledge the video is explaining."}
]
}
]
}
}'
Using local files (Base64 encoded input)
Here are sample codes for passing local image files. Currently, only OpenAI SDK and HTTP method support local files.
Image
Limits on input images. To pass image URL, see Get started.
The Base64-encoded image must be less than 10 MB.
OpenAI
Python
from openai import OpenAI
import os
import base64
# base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxxx/test.jpg with the absolute path of your local image
base64_image = encode_image("xxx/test.jpg")
# Initialize OpenAI client
client = OpenAI(
# If environment variable is not configured, replace with Alibaba Cloud Model Studio API Key: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
reasoning_content = "" # Define complete reasoning process
answer_content = "" # Define complete response
is_answering = False # Determine whether the reasoning process is finished and the response has started
# Create chat completion request
completion = client.chat.completions.create(
model="qvq-max", # Using qvq-max as an example, you can change the model name as needed
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
# Note that when passing Base64, the image format (i.e., image/{format}) needs to be consistent with the Content Type in the supported image list. "f" is a string formatting method.
# PNG image: f"data:image/png;base64,{base64_image}"
# JPEG image: f"data:image/jpeg;base64,{base64_image}"
# WEBP image: f"data:image/webp;base64,{base64_image}"
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
{"type": "text", "text": "How do I solve this problem?"},
],
}
],
stream=True,
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print reasoning process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If environment variable is not configured, replace the following line with: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
// Replace xxxx/test.jpg with the absolute path of your local image
const base64Image = encodeImage("xxx/test.jpg")
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [
{
role: "user",
content: [
{ type: "image_url", image_url: {"url": `data:image/png;base64,${base64Image}`} },
{ type: "text", text: "Please solve this problem" },
]
}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process reasoning
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Video
Video file
The Base64-encoded video must be less than 10 MB.
Python
from openai import OpenAI
import os
import base64
# base64 encoding format
def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode("utf-8")
# Replace xxxx/test.mp4 with the absolute path of your local video
base64_video = encode_video("xxx/test.mp4")
# Initialize OpenAI client
client = OpenAI(
# If environment variable is not configured, replace with Alibaba Cloud Model Studio API Key: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
reasoning_content = "" # Define complete reasoning process
answer_content = "" # Define complete response
is_answering = False # Determine whether the reasoning process is finished and the response has started
# Create chat completion request
completion = client.chat.completions.create(
model="qvq-max", # Using qvq-max as an example, you can change the model name as needed
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
# Note that when passing Base64, change video/mp4 to match your local video file
"video_url": {"url": f"data:video/mp4;base64,{base64_video}"},
},
{"type": "text", "text": "What is this video about?"},
],
}
],
stream=True,
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print reasoning process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If environment variable is not configured, replace the following line with: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeVideo = (videoPath) => {
const videoFile = readFileSync(videoPath);
return videoFile.toString('base64');
};
// Replace xxxx/test.mp4 with the absolute path of your local video
const base64Video = encodeVideo("xxx/test.mp4")
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [
{
role: "user",
content: [
// Note that when passing Base64, change video/mp4 to match your local video file
{ type: "video_url", video_url: {"url": `data:video/mp4;base64,${base64Video}`} },
{ type: "text", text: "What is this video about?" },
]
}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process reasoning
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Image list
Each Base64-encoded video frame must be less than 10 MB.
Python
import os
from openai import OpenAI
import base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image1 = encode_image("football1.jpg")
base64_image2 = encode_image("football2.jpg")
base64_image3 = encode_image("football3.jpg")
base64_image4 = encode_image("football4.jpg")
client = OpenAI(
# If environment variable is not configured, replace the following line with: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
reasoning_content = "" # Define complete reasoning process
answer_content = "" # Define complete response
is_answering = False # Determine whether the reasoning process is finished and the response has started
completion = client.chat.completions.create(
model="qvq-max",
messages=[{"role": "user","content": [
# When passing an image list, the "type" parameter in the user message is "video"
{"type": "video","video": [
f"data:image/png;base64,{base64_image1}",
f"data:image/png;base64,{base64_image2}",
f"data:image/png;base64,{base64_image3}",
f"data:image/png;base64,{base64_image4}",]},
{"type": "text","text": "Describe the specific process in this video?"},
]}],
stream=True,
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Reasoning Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print reasoning process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Complete Reasoning Process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(answer_content)
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If environment variable is not configured, replace the following line with: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [{
role: "user",
content: [
{
// When passing an image list, the "type" parameter in the user message is "video"
type: "video",
video: [
`data:image/png;base64,${base64Image1}`,
`data:image/png;base64,${base64Image2}`,
`data:image/png;base64,${base64Image3}`,
`data:image/png;base64,${base64Image4}`
]
},
{
type: "text",
text: "Describe the specific process in this video"
}
]
}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qvq-max',
messages: messages,
stream: true
});
console.log('\n' + '='.repeat(20) + 'Reasoning Process' + '='.repeat(20) + '\n');
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process reasoning
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Complete Response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Usage notes
Supported image formats
Here are the supported image formats. When using the OpenAI SDK to input local images, set image/{format}
according to the Content Type column.
Image format | File name extension | Content Type |
BMP | .bmp | image/bmp |
JPEG | .jpe, .jpeg, .jpg | image/jpeg |
PNG | .png | image/png |
TIFF | .tif, .tiff | image/tiff |
WEBP | .webp | image/webp |
HEIC | .heic | image/heic |
Image size limitations
The size of a single image file must not exceed 10 MB. When using the OpenAI SDK, the Base64-encoded image must not exceed 10 MB either, see .
The width and height of an image must both be greater than 10 pixels. The aspect ratio must not exceed 200:1 or 1:200.
No pixel count limit for a single image, because the model will scale and preprocess the image before understanding. Larger images do not necessarily improve understanding performance. Recommended pixel values:
For a single image input to
qvq-max
,qvq-max-latest
, orqvq-max-2025-03-25
, the number of pixels should not exceed 1,003,520.
Image number limitations
In multi-image input, the number of images is limited by the model's total token limit for text and images (maximum input). The total token count of all images must be less than the model's maximum input.
For example, qvq-max has a maximum input of 106,496 tokens. The default token limit is 1,280 per image. You can set vl_high_resolution_images
in DashScope to increase the token limit to 16,384 per image. If your input images are all 1280 × 1280:
Token limit per image | Adjusted image | Image tokens | Maximum number of images |
1,280 (default) | 980 x 980 | 1,227 | 86 |
16,384 | 1288 x 1288 | 2,118 | 50 |
API reference
For input and output parameter details, see Qwen.
Error codes
If the call failed and an error message is returned, see Error messages.