The Qwen-VL model can answer questions about the images or videos that you provide. It supports single-image and multi-image inputs for tasks such as image captioning, visual question answering, and object detection.
Try it online: Vision model (Singapore or Beijing)
Getting started
Prerequisites
Obtain an API key and configure it as an environment variable.
If you use an SDK to call the model, install the SDK. The DashScope Python SDK must be version 1.24.6 or later. The DashScope Java SDK must be version 2.21.10 or later.
The following examples show how to call the model to describe the content of an image. For more information about local files and image limits, see Upload local files and Input file limits.
Single-image input
OpenAI compatible
Python
from openai import OpenAI
import os
client = OpenAI(
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
completion = client.chat.completions.create(
model="qwen3-vl-plus", # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/model-studio/getting-started/models
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
},
},
{"type": "text", "text": "What is depicted in the image?"},
],
},
],
)
print(completion.choices[0].message.content)Response
This is a photo taken on a beach. In the photo, a person and a dog are sitting on the sand with the sea and sky in the background. The person and the dog appear to be interacting, with the dog's front paw resting on the person's hand. Sunlight shines from the right side of the frame, adding a warm atmosphere to the scene.Node.js
import OpenAI from "openai";
const openai = new OpenAI({
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "qwen3-vl-plus", // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/model-studio/getting-started/models
messages: [
{
role: "user",
content: [{
type: "image_url",
image_url: {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
},
{
type: "text",
text: "What is depicted in the image?"
}
]
}
]
});
console.log(response.choices[0].message.content);
}
main()Response
This is a photo taken on a beach. In the photo, a person and a dog are sitting on the sand with the sea and sky in the background. The person and the dog appear to be interacting, with the dog's front paw resting on the person's hand. Sunlight shines from the right side of the frame, adding a warm atmosphere to the scene.curl
# ======= Important =======
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-vl-plus",
"messages": [
{"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
{"type": "text", "text": "What is depicted in the image?"}
]
}]
}'Response
{
"choices": [
{
"message": {
"content": "This is a photo taken on a beach. In the photo, a person and a dog are sitting on the sand with the sea and sky in the background. The person and the dog appear to be interacting, with the dog's front paw resting on the person's hand. Sunlight shines from the right side of the frame, adding a warm atmosphere to the scene.",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 1270,
"completion_tokens": 54,
"total_tokens": 1324
},
"created": 1725948561,
"system_fingerprint": null,
"model": "qwen3-vl-plus",
"id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}DashScope
Python
import os
import dashscope
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
{"text": "What is depicted in the image?"}]
}]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus', # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/model-studio/getting-started/models
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Response
This is a photo taken on a beach. In the photo, there is a woman and a dog. The woman is sitting on the sand, smiling and interacting with the dog. The dog is wearing a collar and appears to be shaking hands with the woman. The background is the sea and the sky, and the sunlight shining on them creates a warm atmosphere.Java
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
// The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
Collections.singletonMap("text", "What is depicted in the image?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus") // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/model-studio/getting-started/models
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Response
This is a photo taken on a beach. In the photo, there is a person in a plaid shirt and a dog wearing a collar. The person and the dog are sitting face to face, seemingly interacting. The background is the sea and the sky, and the sunlight shining on them creates a warm atmosphere.curl
# ======= Important =======
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
{"text": "What is depicted in the image?"}
]
}
]
}
}'Response
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "This is a photo taken on a beach. In the photo, there is a person in a plaid shirt and a dog wearing a collar. They are sitting on the sand with the sea and sky in the background. Sunlight shines from the right side of the frame, adding a warm atmosphere to the scene."
}
]
}
}
]
},
"usage": {
"output_tokens": 55,
"input_tokens": 1271,
"image_tokens": 1247
},
"request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}Multi-image input
The Qwen-VL model supports passing multiple images in a single request. This is useful for tasks such as product comparison and multi-page document processing. To do this, include multiple image objects in the content array of the user message.
The number of images is limited by the model's maximum input tokens. The total number of tokens for all images and text must not exceed the model's token limit.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus", # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/model-studio/getting-started/models
messages=[
{"role": "user","content": [
{"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},},
{"type": "image_url","image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},},
{"type": "text", "text": "What do these images depict?"},
],
}
],
)
print(completion.choices[0].message.content)Response
Image 1 shows a scene of a woman and a Labrador retriever interacting on a beach. The woman is wearing a plaid shirt and sitting on the sand, shaking hands with the dog. The background is the ocean waves and the sky, and the whole picture is filled with a warm and pleasant atmosphere.
Image 2 shows a scene of a tiger walking in a forest. The tiger's fur is orange with black stripes. It is stepping forward, surrounded by dense trees and vegetation. The ground is covered with fallen leaves, and the whole picture gives a feeling of wild nature.Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
async function main() {
const response = await openai.chat.completions.create({
model: "qwen3-vl-plus", // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://help.aliyun.com/en/model-studio/models
messages: [
{role: "user",content: [
{type: "image_url",image_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
{type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}},
{type: "text", text: "What do these images depict?" },
]}]
});
console.log(response.choices[0].message.content);
}
main()Response
In the first image, a person and a dog are interacting on a beach. The person is wearing a plaid shirt, and the dog is wearing a collar. They seem to be shaking hands or giving a high-five.
In the second image, a tiger is walking in a forest. The tiger's fur is orange with black stripes, and the background is green trees and vegetation.curl
# ======= Important =======
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
}
},
{
"type": "text",
"text": "What do these images depict?"
}
]
}
]
}'Response
{
"choices": [
{
"message": {
"content": "Image 1 shows a scene of a woman and a Labrador retriever interacting on a beach. The woman is wearing a plaid shirt and sitting on the sand, shaking hands with the dog. The background is a sea view and a sunset sky, making the whole scene look very warm and harmonious.\n\nImage 2 shows a scene of a tiger walking in a forest. The tiger's fur is orange with black stripes. It is stepping forward, surrounded by dense trees and vegetation. The ground is covered with fallen leaves, and the whole picture is full of natural wildness and vitality.",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 2497,
"completion_tokens": 109,
"total_tokens": 2606
},
"created": 1725948561,
"system_fingerprint": null,
"model": "qwen3-vl-plus",
"id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}DashScope
Python
import os
import dashscope
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
{"text": "What do these images depict?"}
]
}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus', # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/model-studio/getting-started/models
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Response
These images show some animals and natural scenes. In the first image, a person and a dog are interacting on a beach. The second image is of a tiger walking in a forest.Java
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"),
Collections.singletonMap("text", "What do these images depict?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus") // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/model-studio/getting-started/models
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); }
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Response
These images show some animals and natural scenes.
1. First image: A woman and a dog are interacting on a beach. The woman is wearing a plaid shirt and sitting on the sand, and the dog is wearing a collar and extending its paw to shake hands with the woman.
2. Second image: A tiger is walking in a forest. The tiger's fur is orange with black stripes, and the background is trees and leaves.curl
# ======= Important =======
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
{"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
{"text": "What do these images show?"}
]
}
]
}
}'Response
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "These images show some animals and natural scenes. In the first image, a person and a dog are interacting on a beach. The second image is of a tiger walking in a forest."
}
]
}
}
]
},
"usage": {
"output_tokens": 81,
"input_tokens": 1277,
"image_tokens": 2497
},
"request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}Model selection
For tasks such as high-precision object detection and localization (including 3D localization), agent tool calling, document and webpage parsing, complex problem-solving, and long video understanding, we recommend using Qwen3-VL. A comparison of the models in this series is as follows:
qwen3-vl-plus: The most powerful model.qwen3-vl-flash: A faster and more cost-effective option. It provides a balance between performance and cost, making it suitable for scenarios that are sensitive to response speed.
For general tasks such as simple image captioning and short video summary extraction, you can choose Qwen2.5-VL. A comparison of the models in this series is as follows:
qwen-vl-max: The best-performing version in the Qwen2.5-VL series.qwen-vl-plus: Faster and provides a good balance between performance and cost.
For more information about model names, context, prices, and snapshot versions, see Model list. For information about concurrent request limits, see Throttling.
Feature overview
Image Q&A
Describe, classify, or tag the content of an image, such as identifying people, places, animals, and plants.
If the sun is too bright, what item from this image should I use? | When the sun is too bright, you should use the pink sunglasses from the image. Sunglasses can effectively block strong light, reduce UV damage to your eyes, and help protect your vision and improve visual comfort in bright sunlight. |
Creative writing
Generate vivid text descriptions from image or video content. This feature is suitable for creative scenarios such as story writing, copywriting, and short video scripts.
Please help me write an interesting social media post based on the content of the image. | Of course. This image is full of the rich atmosphere of the Mid-Autumn Festival, with a clever combination of classical and modern elements. Based on the main elements in the image, I have prepared several social media posts in different styles for you. You can choose the one you like. Poetic and beautiful style Tonight, the moon is bright for all to see, but where do the autumn thoughts fall? Chang'e flies to the moon, and the Jade Rabbit pounds medicine. The romance of the ancients is lit up tonight. May this bright moon light your way home and carry my deepest thoughts. Happy Mid-Autumn Festival! Warm and blessing style The moon is full, and people are reunited. The Mid-Autumn night is the gentlest. Watch the fireworks bloom, admire the full moon in the sky, eat a piece of mooncake, and want for good health. May all that you and I want for come true. Happy Mid-Autumn Festival to everyone, and may your family be happy! |
Text recognition and information extraction
Recognize text and formulas in images, or extract information from receipts, certificates, and forms. This feature supports formatted text output. The Qwen3-VL model increases the number of supported languages to 33. For a list of supported languages, see Model feature comparison.
Extract the following from the image: ['Invoice Code', 'Invoice Number', 'Destination', 'Fuel Surcharge', 'Fare', 'Travel Date', 'Departure Time', 'Train Number', 'Seat Number']. Please output in JSON format. | { "Invoice Code": "221021325353", "Invoice Number": "10283819", "Destination": "Development Zone", "Fuel Surcharge": "2.0", "Fare": "8.00<Full>", "Travel Date": "2013-06-29", "Departure Time": "Serial", "Train Number": "040", "Seat Number": "371" } |
Multi-disciplinary problem-solving
Solve problems from subjects such as mathematics, physics, and chemistry that are presented in images. This feature is suitable for various educational levels, from primary school to university and adult education.
Please solve the math problem in the image step by step. |
|
Visual coding
Generate code from images or videos. For example, you can generate HTML, CSS, and JS code from design drafts or website screenshots.
Create a webpage using HTML and CSS based on my sketch. The main color should be black. |
Webpage preview |
Object detection
Supports 2D and 3D detection. You can use this feature to determine object orientation, perspective changes, and occlusion relationships. 3D detection is a new feature of the Qwen3-VL model.
The object detection performance of the Qwen2.5-VL model is robust within a resolution range of 480 × 480 to 2560 × 2560 pixels. Outside this range, the detection accuracy may decrease, and bounding box drift may occur. For more information about how to draw the detection results on the original image, see the FAQ.
2D detection
| Visualization of 2D detection results
|
Document parsing
Parse image-based documents, such as scanned copies or image PDFs, into QwenVL HTML or QwenVL Markdown format. This format accurately recognizes text and preserves the position information of elements such as images and tables. The Qwen3-VL model adds support for parsing into Markdown format.
The recommended prompts areqwenvl html(to parse into HTML format) andqwenvl markdown(to parse into Markdown format).
qwenvl markdown. |
Visualization of results |
Video understanding
Analyze video content, such as locating specific events and obtaining timestamps, or generating summaries of key time periods.
Please describe the series of actions of the person in the video. Output the start time (start_time), end time (end_time), and event (event) in JSON format. Please use HH:mm:ss to represent the timestamp. | { "events": [ { "start_time": "00:00:00", "end_time": "00:00:05", "event": "A person walks towards a table holding a cardboard box and places it on the table." }, { "start_time": "00:00:05", "end_time": "00:00:15", "event": "The person picks up a scanner and scans the label on the cardboard box." }, { "start_time": "00:00:15", "end_time": "00:00:21", "event": "The person puts the scanner back in its place and then picks up a pen to write information in a notebook."}] } |
Core features
Enable or disable thinking mode
The qwen3-vl-plus and qwen3-vl-flash series models are hybrid thinking models that can either respond directly or after a thinking process. You can use the
enable_thinkingparameter to control this behavior:true: Enables thinking mode.false(default): Disables thinking mode.
Models with the `thinking` suffix, such as qwen3-vl-235b-a22b-thinking, are thinking-only models. These models always think before responding, and this feature cannot be disabled.
Model configuration: In general conversation scenarios that do not involve agent tool calls, we recommend that you do not set a
System Messageto maintain optimal performance. You can pass instructions, such as model role settings and output format requirements, through theUser Message.Prioritize streaming output: When thinking mode is enabled, both streaming and non-streaming output are supported. To avoid timeouts caused by long responses, we recommend using streaming output.
Limit thinking length: Thinking models can sometimes output lengthy reasoning processes. You can use the
thinking_budgetparameter to limit the length of the thinking process. If the number of tokens generated during the model's thinking process exceeds thethinking_budget, the reasoning content is truncated, and the model immediately starts generating the final response. The default value ofthinking_budgetis the model's maximum chain-of-thought length. For more information, see Model list.
OpenAI compatible
enable_thinking is not a standard OpenAI parameter. If you use the OpenAI Python SDK, pass it through extra_body.
import os
from openai import OpenAI
client = OpenAI(
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
reasoning_content = "" # Define the full thinking process
answer_content = "" # Define the full response
is_answering = False # Determine if the thinking process has ended and the response has started
enable_thinking = True
# Create a chat completion request
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
},
},
{"type": "text", "text": "How do I solve this problem?"},
],
},
],
stream=True,
# The enable_thinking parameter enables the thinking process. The thinking_budget parameter sets the maximum number of tokens for the reasoning process.
# For qwen3-vl-plus and qwen3-vl-flash, you can enable or disable thinking with enable_thinking. For models with the `thinking` suffix, such as qwen3-vl-235b-a22b-thinking, enable_thinking can only be set to true. This does not apply to other Qwen-VL models.
extra_body={
'enable_thinking': enable_thinking,
"thinking_budget": 81920},
# Uncomment the following to return token usage in the last chunk
# stream_options={
# "include_usage": True
# }
)
if enable_thinking:
print("\n" + "=" * 20 + "Thinking process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print the thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start the response
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Full response" + "=" * 20 + "\n")
is_answering = True
# Print the response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(answer_content)import OpenAI from "openai";
// Initialize the OpenAI client
const openai = new OpenAI({
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let enableThinking = true;
let messages = [
{
role: "user",
content: [
{ type: "image_url", image_url: { "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg" } },
{ type: "text", text: "Solve this problem" },
]
}]
async function main() {
try {
const stream = await openai.chat.completions.create({
model: 'qwen3-vl-plus',
messages: messages,
stream: true,
// Note: In the Node.js SDK, non-standard parameters like enableThinking are passed as top-level properties and do not need to be placed in extra_body.
enable_thinking: enableThinking,
thinking_budget: 81920
});
if (enableThinking){console.log('\n' + '='.repeat(20) + 'Thinking process' + '='.repeat(20) + '\n');}
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Handle the thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Handle the formal response
else if (delta.content) {
if (!isAnswering) {
console.log('\n' + '='.repeat(20) + 'Full response' + '='.repeat(20) + '\n');
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
} catch (error) {
console.error('Error:', error);
}
}
main();# ======= Important =======
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
}
},
{
"type": "text",
"text": "Please solve this problem"
}
]
}
],
"stream":true,
"stream_options":{"include_usage":true},
"enable_thinking": true,
"thinking_budget": 81920
}'DashScope
import os
import dashscope
from dashscope import MultiModalConversation
# If you use a model in the Singapore region, uncomment the following line.
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"
enable_thinking=True
messages = [
{
"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
{"text": "Solve this problem."}
]
}
]
response = MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-vl-plus",
messages=messages,
stream=True,
# The enable_thinking parameter enables the thinking process.
# For qwen3-vl-plus and qwen3-vl-flash, you can enable or disable thinking with enable_thinking. For models with the `thinking` suffix, such as qwen3-vl-235b-a22b-thinking, enable_thinking can only be set to true. This does not apply to other Qwen-VL models.
enable_thinking=enable_thinking,
# The thinking_budget parameter sets the maximum number of tokens for the reasoning process.
thinking_budget=81920,
)
# Define the full thinking process
reasoning_content = ""
# Define the full response
answer_content = ""
# Determine if the thinking process has ended and the response has started
is_answering = False
if enable_thinking:
print("=" * 20 + "Thinking process" + "=" * 20)
for chunk in response:
# If both the thinking process and the response are empty, ignore
message = chunk.output.choices[0].message
reasoning_content_chunk = message.get("reasoning_content", None)
if (chunk.output.choices[0].message.content == [] and
reasoning_content_chunk == ""):
pass
else:
# If it is currently in the thinking process
if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
print(chunk.output.choices[0].message.reasoning_content, end="")
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If it is currently responding
elif chunk.output.choices[0].message.content != []:
if not is_answering:
print("\n" + "=" * 20 + "Full response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content[0]["text"], end="")
answer_content += chunk.output.choices[0].message.content[0]["text"]
# To print the full thinking process and the full response, uncomment the following code and run it.
# print("=" * 20 + "Full thinking process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Full response" + "=" * 20 + "\n")
# print(f"{answer_content}")// DashScope SDK version >= 2.21.10
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
private static void handleGenerationResult(MultiModalConversationResult message) {
String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String reasoning = Objects.isNull(re)?"":re; // default value
List<Map<String, Object>> content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (!reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (Objects.nonNull(content) && !content.isEmpty()) {
Object text = content.get(0).get("text");
finalContent.append(content.get(0).get("text"));
if (!isFirstPrint) {
System.out.println("\n====================Full response====================");
isFirstPrint = true;
}
System.out.print(text);
}
}
public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg) {
return MultiModalConversationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus")
.messages(Arrays.asList(Msg))
.enableThinking(true)
.thinkingBudget(81920)
.incrementalOutput(true)
.build();
}
public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(message -> {
handleGenerationResult(message);
});
}
public static void main(String[] args) {
try {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMsg = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("image", "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"),
Collections.singletonMap("text", "Please solve this problem")))
.build();
streamCallWithMessage(conv, userMsg);
// Print the final result
// if (reasoningContent.length() > 0) {
// System.out.println("\n====================Full response====================");
// System.out.println(finalContent.toString());
// }
} catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
{"text": "Please solve this problem"}
]
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"thinking_budget": 81920
}
}'Enable high-resolution mode
For images that contain a large amount of detail, you can set the vl_high_resolution_images parameter to True to enable high-resolution mode. When this mode is enabled, the default token limit per image increases from 1280 or 2560 to 16384.
Mode | Token limit per image | Scenarios | Cost and latency |
Enabled ( | 16384 | Scenarios with rich content that require attention to detail | Higher |
Disabled ( | Qwen3-VL, Other Qwen-VL models: 1280 | Scenarios with fewer details, high speed requirements, or cost sensitivity. | Lower |
OpenAI compatible
vl_high_resolution_images is not a standard OpenAI parameter. If you use the OpenAI Python SDK, pass it through extra_body.
Python
import os
import time
from openai import OpenAI
client = OpenAI(
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
def test_resolution(high_resolution=False):
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=[
{"role": "user","content": [
{"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"},},
{"type": "text", "text": "What do these images depict?"},
],
}
],
extra_body={'enable_thinking': False,
"vl_high_resolution_images":high_resolution}
)
usage_info= completion.usage.prompt_tokens
return {
'usage_info': usage_info
}
# Print the comparison result
print("\n==================== Token Usage Comparison ====================")
# Test low resolution
result_low = test_resolution(high_resolution=False)
# Wait a moment to avoid API limits
time.sleep(2)
# Test high resolution
result_high = test_resolution(high_resolution=True)
if result_low['usage_info'] and result_high['usage_info']:
low_tokens = result_low['usage_info']
high_tokens = result_high['usage_info']
print(f"High-resolution mode - Total input tokens: {high_tokens}")
print(f"Low-resolution mode - Total input tokens: {low_tokens}")
print(f"Difference: {high_tokens - low_tokens} tokens")Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
async function test_resolution(high_resolution) {
const response = await openai.chat.completions.create({
model: "qwen3-vl-plus",
messages: [
{role: "user",content: [
{type: "image_url",image_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"}},
{type: "text", text: "What do these images depict?" },
]}],
enable_thinking: false,
vl_high_resolution_images:high_resolution
});
return response.usage.prompt_tokens;
}
// Test low and high resolution
(async function main() {
console.log("\n==================== Token Usage Comparison ====================")
const result_low = await test_resolution(false);
const result_high = await test_resolution(true);
console.log("High resolution - Total input tokens:",result_high);
console.log("Low resolution - Total input tokens:", result_low);
console.log("Difference:", result_high-result_low);
})();
curl
# ======= Important =======
# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following base_url is for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"
}
},
{
"type": "text",
"text": "What do these images depict?"
}
]
}
],
"enable_thinking": false,
"vl_high_resolution_images":true
}'DashScope
Python
import os
import time
import dashscope
# The following is the base_url for the Singapore region. To use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"},
{"text": "What does this image show?"}
]
}
]
def test_resolution(high_resolution=False):
"""Tests the results of different resolution settings."""
response = dashscope.MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus',
messages=messages,
enable_thinking=False,
vl_high_resolution_images=high_resolution
)
return {
'usage_info': response.usage
}
# Print the comparison results.
print("\n==================== Token Usage Comparison ====================")
# Test low resolution.
result_low = test_resolution(high_resolution=False)
# Wait for a moment to avoid API rate limits.
time.sleep(2)
# Test high resolution.
result_high = test_resolution(high_resolution=True)
if result_low['usage_info'] and result_high['usage_info']:
low_tokens = result_low['usage_info'].input_tokens_details['image_tokens']
high_tokens = result_high['usage_info'].input_tokens_details['image_tokens']
print(f"High-resolution mode - Image tokens: {high_tokens}")
print(f"Low-resolution mode - Image tokens: {low_tokens}")
print(f"Difference: {high_tokens - low_tokens} tokens ")Java
// DashScope SDK version >= 2.21.10
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. To use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static Integer simpleMultiModalConversationCall(boolean highResolution)
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"),
Collections.singletonMap("text", "What does this image show?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus")
.enableThinking(false)
.messages(Arrays.asList(userMessage))
.vlHighResolutionImages(highResolution)
.build();
MultiModalConversationResult result = conv.call(param);
return result.getUsage().getImageTokens();
}
public static void main(String[] args) {
try {
// Invoke the high-resolution pattern.
Integer highResToken = simpleMultiModalConversationCall(true);
// Invoke the low-resolution pattern.
Integer lowResToken = simpleMultiModalConversationCall(false);
// Print the comparison results.
System.out.println("=== Token Usage Comparison ===");
System.out.println("High-resolution pattern token count: " + highResToken);
System.out.println("Low-resolution pattern token count: " + lowResToken);
System.out.println("Difference: " + (highResToken - lowResToken));
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. To use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution. ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"},
{"text": "What does this image show?"}
]
}
]
},
"parameters": {
"vl_high_resolution_images": true,
"enable_thinking": false
}
}'Video understanding
The Qwen-VL model can understand video content from a list of images (video frames) or a video file.
For optimal performance when understanding video files, use the latest version or a recent snapshot version of the model.
Video files
Video frame extraction
The Qwen-VL model understands video content by extracting frames. The frame extraction frequency determines the level of detail in the analysis. The method for controlling this frequency differs between software development kits (SDKs):
Use the DashScope SDK:
The fps parameter controls the frame rate. It specifies that one frame is extracted every
seconds. The default value is 2.0, and the valid range is (0.1, 10). Use a higher fpsvalue for fast-moving scenarios and a lowerfpsvalue for static scenes or long videos.
OpenAI compatible SDK: The frame extraction frequency is fixed at one frame every 0.5 seconds and cannot be customized.
The following code samples show how to understand an online video specified by a URL. For more information, see how to upload a local file.
OpenAI compatible
When you directly input a video file to the Qwen-VL model using the OpenAI SDK or HTTP, set the"type"parameter in the user message to"video_url".
Python
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=[
{"role": "user","content": [{
# When passing a video file directly, set the value of type to video_url.
# When using the OpenAI SDK, one frame is extracted from the video every 0.5 seconds by default. This frequency cannot be changed. To customize the frame rate, use the DashScope SDK.
"type": "video_url",
"video_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"}},
{"type": "text","text": "What is the content of this video?"}]
}]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
async function main() {
const response = await openai.chat.completions.create({
model: "qwen3-vl-plus",
messages: [
{role: "user",content: [
// When passing a video file directly, set the value of type to video_url.
// When using the OpenAI SDK, one frame is extracted from the video every 0.5 seconds by default. This frequency cannot be changed. To customize the frame rate, use the DashScope SDK.
{type: "video_url", video_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"}},
{type: "text", text: "What is the content of this video?" },
]}]
});
console.log(response.choices[0].message.content);
}
main()curl
# ======= Important Note =======
# This base_url is for the Singapore region. To use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [
{"role": "user","content": [{"type": "video_url","video_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"}},
{"type": "text","text": "What is the content of this video?"}]}]
}'DashScope
Python
import dashscope
import os
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "user",
"content": [
# The fps parameter controls the video frame extraction frequency. It specifies that one frame is extracted every 1/fps seconds. For complete usage, see: https://www.alibabacloud.com/help/en/model-studio/use-qwen-by-calling-api?#2ed5ee7377fum
{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
{"text": "What is the content of this video?"}
]
}
]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key ="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus',
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
// The fps parameter controls the video frame extraction frequency. It specifies that one frame is extracted every 1/fps seconds. For complete usage, see: https://www.alibabacloud.com/help/en/model-studio/use-qwen-by-calling-api?#2ed5ee7377fum
Map<String, Object> params = new HashMap<>();
params.put("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4");
params.put("fps", 2);
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
params,
Collections.singletonMap("text", "What is the content of this video?"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If you use a model in the China (Beijing) region, you must use an API key for the China (Beijing) region. Get an API key: https://bailian.console.alibabacloud.com/?tab=model#/api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{"role": "user","content": [{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
{"text": "What is the content of this video?"}]}]}
}'Image lists
Image list limits
Qwen3-VL, Qwen2.5-VL, and QVQ series models: A minimum of 4 images and a maximum of 512 images.
Other models: A minimum of 4 images and a maximum of 80 images.
Video frame extraction
When you provide a video as a list of images (pre-extracted video frames), you can use the fps parameter to inform the model of the time interval between frames. This helps the model better understand the sequence, duration, and dynamics of events.
Use the DashScope SDK:
You can specify the frame rate of the original video using the
fpsparameter. This setting extracts a frame from the original video everyseconds. This parameter is supported by the Qwen2.5-VL and Qwen3-VL models. Use an OpenAI compatible SDK:
The
fpsparameter is not supported. The model assumes a video frame extraction frequency of one frame every 0.5 seconds.
The following code samples show how to understand online video frames specified by URLs. For more information, see how to upload local files.
OpenAI compatible
When you input a video as a list of images to the Qwen-VL model using the OpenAI SDK or HTTP, set the"type"parameter in the user message to"video".
Python
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus", # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see: https://www.alibabacloud.com/help/en/model-studio/models-and-pricing
messages=[{"role": "user","content": [
# When passing a list of images, the "type" parameter in the user message is "video".
# When using the OpenAI SDK, the image list is assumed to be extracted from the video at a rate of one frame every 0.5 seconds. This cannot be changed. To customize the frame rate, use the DashScope SDK.
{"type": "video","video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
{"type": "text","text": "Describe the specific process in this video"},
]}]
)
print(completion.choices[0].message.content)Node.js
// Make sure you have specified "type": "module" in your package.json file.
import OpenAI from "openai";
const openai = new OpenAI({
// API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "qwen3-vl-plus", // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see: https://www.alibabacloud.com/help/en/model-studio/models-and-pricing
messages: [{
role: "user",
content: [
{
// When passing a list of images, the "type" parameter in the user message is "video".
// When using the OpenAI SDK, the image list is assumed to be extracted from the video at a rate of one frame every 0.5 seconds. This cannot be changed. To customize the frame rate, use the DashScope SDK.
type: "video",
video: [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
type: "text",
text: "Describe the specific process in this video"
}
]
}]
});
console.log(response.choices[0].message.content);
}
main();curl
# ======= Important =======
# API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [{"role": "user",
"content": [{"type": "video",
"video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
{"type": "text",
"text": "Describe the specific process in this video"}]}]
}'DashScope
Python
import os
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{"role": "user",
"content": [
# If the model is from the Qwen2.5-VL or Qwen3-VL series and you pass a list of images, you can set the fps parameter. This indicates that the image list was extracted from the original video at a rate of one frame every 1/fps seconds.
{"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
"fps":2},
{"text": "Describe the specific process in this video"}]}]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen3-vl-plus', # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see: https://www.alibabacloud.com/help/en/model-studio/models-and-pricing
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])Java
// DashScope SDK version 2.18.3 or later is required.
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final String MODEL_NAME = "qwen3-vl-plus"; // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see: https://www.alibabacloud.com/help/en/model-studio/models-and-pricing
public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
// If the model is from the Qwen2.5-VL or Qwen3-VL series and you pass a list of images, you can set the fps parameter. This indicates that the image list was extracted from the original video at a rate of one frame every 1/fps seconds.
Map<String, Object> params = Map.of(
"video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"),
"fps",2);
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
params,
Collections.singletonMap("text", "Describe the specific process in this video")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL_NAME)
.messages(Arrays.asList(userMessage)).build();
MultiModalConversationResult result = conv.call(param);
System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
videoImageListSample();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
],
"fps":2
},
{
"text": "Describe the specific process in this video"
}
]
}
]
}
}'Upload local files (Base64 encoding or file path)
Qwen-VL provides two methods to upload local files:
Upload using Base64 encoding
Upload directly using a file path (more stable and recommended)
Upload methods
Upload using Base64 encoding
Convert the file to a Base64-encoded string and pass it to the model. This method is compatible with the OpenAI compatible mode, the DashScope software development kit (SDK), and HTTP requests.
Upload using a file path
Directly pass the local file path to the model. This method is supported only by the DashScope Python and Java SDKs. It is not supported for DashScope HTTP requests or the OpenAI compatible mode.
Refer to the following table to specify the file path based on your programming language and operating system.
Limits:
For better stability, we recommend uploading files using a file path. For files smaller than 1 MB, you can use Base64 encoding.
When you pass a file path directly, a single image or video frame (from an image list) must be smaller than 10 MB, and a single video must be smaller than 100 MB.
When you use Base64 encoding, the encoded data size increases. Ensure that a single encoded image or video is smaller than 10 MB.
To compress a file, see How do I compress an image or video to meet the size requirements?
Images
Upload using a file path
Passing a file path is supported only when you call the model using the DashScope Python and Java SDKs. This method is not supported for DashScope HTTP requests or the OpenAI compatible mode.
Python
import os
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace xxx/eagle.png with the absolute path of your local image
local_path = "xxx/eagle.png"
image_path = f"file://{local_path}"
messages = [
{'role':'user',
'content': [{'image': image_path},
{'text': 'What is depicted in the image?'}]}]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus', # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=messages)
print(response.output.choices[0].message.content[0]["text"])Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void callWithLocalFile(String localPath)
throws ApiException, NoApiKeyException, UploadFileException {
String filePath = "file://"+localPath;
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
new HashMap<String, Object>(){{put("text", "What is depicted in the image?");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus") // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}
public static void main(String[] args) {
try {
// Replace xxx/eagle.png with the absolute path of your local image
callWithLocalFile("xxx/eagle.png");
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Upload using Base64 encoding
OpenAI compatible
Python
from openai import OpenAI
import os
import base64
# Encoding function: Converts a local file to a Base64-encoded string.
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxx/eagle.png")
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus", # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
# When passing Base64 image data, note that the image format (image/{format}) must match the Content Type in the list of supported images. "f" is a string formatting method.
# PNG image: f"data:image/png;base64,{base64_image}"
# JPEG image: f"data:image/jpeg;base64,{base64_image}"
# WEBP image: f"data:image/webp;base64,{base64_image}"
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
{"type": "text", "text": "What is depicted in the image?"},
],
}
],
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
// Replace xxx/eagle.png with the absolute path of your local image
const base64Image = encodeImage("xxx/eagle.png")
async function main() {
const completion = await openai.chat.completions.create({
model: "qwen3-vl-plus", // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages: [
{"role": "user",
"content": [{"type": "image_url",
// Note: When passing Base64 data, the image format (image/{format}) must match the Content Type in the list of supported images.
// PNG image: data:image/png;base64,${base64Image}
// JPEG image: data:image/jpeg;base64,${base64Image}
// WEBP image: data:image/webp;base64,${base64Image}
"image_url": {"url": `data:image/png;base64,${base64Image}`},},
{"type": "text", "text": "What is depicted in the image?"}]}]
});
console.log(completion.choices[0].message.content);
}
main();curl
For the method to convert a file to a Base64-encoded string, see the example code.
For demonstration purposes, the Base64 string in the code,
"data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...", is truncated. In actual use, be sure to pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA"}},
{"type": "text", "text": "What is depicted in the image?"}
]
}]
}'DashScope
Python
import base64
import os
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Encoding function: Converts a local file to a Base64-encoded string.
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxxx/eagle.png with the absolute path of your local image
base64_image = encode_image("xxxx/eagle.png")
messages = [
{
"role": "user",
"content": [
# Note: When passing Base64 data, the image format (image/{format}) must match the Content Type in the list of supported images. "f" is a string formatting method.
# PNG image: f"data:image/png;base64,{base64_image}"
# JPEG image: f"data:image/jpeg;base64,{base64_image}"
# WEBP image: f"data:image/webp;base64,{base64_image}"
{"image": f"data:image/png;base64,{base64_image}"},
{"text": "What is depicted in the image?"},
],
},
]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-vl-plus", # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])Java
import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static String encodeImageToBase64(String imagePath) throws IOException {
Path path = Paths.get(imagePath);
byte[] imageBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(imageBytes);
}
public static void callWithLocalFile(String localPath) throws ApiException, NoApiKeyException, UploadFileException, IOException {
String base64Image = encodeImageToBase64(localPath); // Base64 encoding
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
new HashMap<String, Object>() {{ put("image", "data:image/png;base64," + base64Image); }},
new HashMap<String, Object>() {{ put("text", "What is depicted in the image?"); }}
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
// Replace xxx/eagle.png with the absolute path of your local image
callWithLocalFile("xxx/eagle.png");
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
For the method to convert a file to a Base64-encoded string, see the example code.
For demonstration purposes, the Base64 string in the code,
"data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...", is truncated. In actual use, be sure to pass the complete encoded string.
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."},
{"text": "What is depicted in the image?"}
]
}
]
}
}'Video files
This section uses a locally saved test.mp4 file as an example.
Upload using a file path
Passing a file path is supported only when you call the model using the DashScope Python and Java SDKs. This method is not supported for DashScope HTTP requests or the OpenAI compatible mode.
Python
import os
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace xxx/test.mp4 with the absolute path of your local video
local_path = "xxx/test.mp4"
video_path = f"file://{local_path}"
messages = [
{'role':'user',
# The fps parameter controls the number of frames extracted from the video. It means one frame is extracted every 1/fps seconds.
'content': [{'video': video_path,"fps":2},
{'text': 'What does this video depict?'}]}]
response = MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus',
messages=messages)
print(response.output.choices[0].message.content[0]["text"])Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void callWithLocalFile(String localPath)
throws ApiException, NoApiKeyException, UploadFileException {
String filePath = "file://"+localPath;
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>()
{{
put("video", filePath);// The fps parameter controls the number of frames extracted from the video. It means one frame is extracted every 1/fps seconds.
put("fps", 2);
}},
new HashMap<String, Object>(){{put("text", "What does this video depict?");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}
public static void main(String[] args) {
try {
// Replace xxx/test.mp4 with the absolute path of your local video
callWithLocalFile("xxx/test.mp4");
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Upload using Base64 encoding
OpenAI compatible
Python
from openai import OpenAI
import os
import base64
# Encoding function: Converts a local file to a Base64-encoded string.
def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode("utf-8")
# Replace xxx/test.mp4 with the absolute path of your local video
base64_video = encode_video("xxx/test.mp4")
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=[
{
"role": "user",
"content": [
{
# When passing a video file directly, set the value of type to video_url.
"type": "video_url",
"video_url": {"url": f"data:video/mp4;base64,{base64_video}"},
},
{"type": "text", "text": "What does this video depict?"},
],
}
],
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeVideo = (videoPath) => {
const videoFile = readFileSync(videoPath);
return videoFile.toString('base64');
};
// Replace xxx/test.mp4 with the absolute path of your local video
const base64Video = encodeVideo("xxx/test.mp4")
async function main() {
const completion = await openai.chat.completions.create({
model: "qwen3-vl-plus",
messages: [
{"role": "user",
"content": [{
// When passing a video file directly, set the value of type to video_url.
"type": "video_url",
"video_url": {"url": `data:video/mp4;base64,${base64Video}`}},
{"type": "text", "text": "What does this video depict?"}]}]
});
console.log(completion.choices[0].message.content);
}
main();
curl
For the method to convert a file to a Base64-encoded string, see the example code.
For demonstration purposes, the Base64 string in the code,
"data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...", is truncated. In actual use, be sure to pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{"type": "video_url", "video_url": {"url": "data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."}},
{"type": "text", "text": "What is depicted in the image?"}
]
}]
}'DashScope
Python
import base64
import os
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Encoding function: Converts a local file to a Base64-encoded string.
def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode("utf-8")
# Replace xxxx/test.mp4 with the absolute path of your local video
base64_video = encode_video("xxxx/test.mp4")
messages = [{'role':'user',
# The fps parameter controls the number of frames extracted from the video. It means one frame is extracted every 1/fps seconds.
'content': [{'video': f"data:video/mp4;base64,{base64_video}","fps":2},
{'text': 'What does this video depict?'}]}]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus',
messages=messages)
print(response.output.choices[0].message.content[0]["text"])Java
import java.io.IOException;
import java.util.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static String encodeVideoToBase64(String videoPath) throws IOException {
Path path = Paths.get(videoPath);
byte[] videoBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(videoBytes);
}
public static void callWithLocalFile(String localPath)
throws ApiException, NoApiKeyException, UploadFileException, IOException {
String base64Video = encodeVideoToBase64(localPath); // Base64 encoding
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>()
{{
put("video", "data:video/mp4;base64," + base64Video);// The fps parameter controls the number of frames extracted from the video. It means one frame is extracted every 1/fps seconds.
put("fps", 2);
}},
new HashMap<String, Object>(){{put("text", "What does this video depict?");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
// Replace xxx/test.mp4 with the absolute path of your local image
callWithLocalFile("xxx/test.mp4");
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
For the method to convert a file to a Base64-encoded string, see the example code.
For demonstration purposes, the Base64 string in the code,
"f"data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...", is truncated. In actual use, be sure to pass the complete encoded string.
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"video": "data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."},
{"text": "What is depicted in the image?"}
]
}
]
}
}'Image lists
This section uses the locally saved files football1.jpg, football2.jpg, football3.jpg, and football4.jpg as examples.
Upload using a file path
Passing a file path is supported only when you call the model using the DashScope Python and Java SDKs. This method is not supported for DashScope HTTP requests or the OpenAI compatible mode.
Python
import os
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
local_path1 = "football1.jpg"
local_path2 = "football2.jpg"
local_path3 = "football3.jpg"
local_path4 = "football4.jpg"
image_path1 = f"file://{local_path1}"
image_path2 = f"file://{local_path2}"
image_path3 = f"file://{local_path3}"
image_path4 = f"file://{local_path4}"
messages = [{'role':'user',
# If the model is from the Qwen2.5-VL series and you pass an image list, you can set the fps parameter. This indicates that the image list is extracted from the original video every 1/fps seconds. This setting has no effect on other models.
'content': [{'video': [image_path1,image_path2,image_path3,image_path4],"fps":2},
{'text': 'What does this video depict?'}]}]
response = MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus', # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=messages)
print(response.output.choices[0].message.content[0]["text"])Java
// DashScope SDK version 2.18.3 or later is required.
import java.util.Arrays;
import java.util.Map;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static final String MODEL_NAME = "qwen3-vl-plus"; // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
public static void videoImageListSample(String localPath1, String localPath2, String localPath3, String localPath4)
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
String filePath1 = "file://" + localPath1;
String filePath2 = "file://" + localPath2;
String filePath3 = "file://" + localPath3;
String filePath4 = "file://" + localPath4;
Map<String, Object> params = Map.of(
"video", Arrays.asList(filePath1,filePath2,filePath3,filePath4),
// If the model is from the Qwen2.5-VL series and you pass an image list, you can set the fps parameter. This indicates that the image list is extracted from the original video every 1/fps seconds. This setting has no effect on other models.
"fps",2);
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(params,
Collections.singletonMap("text", "Describe the process shown in this video")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL_NAME)
.messages(Arrays.asList(userMessage)).build();
MultiModalConversationResult result = conv.call(param);
System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
videoImageListSample(
"xxx/football1.jpg",
"xxx/football2.jpg",
"xxx/football3.jpg",
"xxx/football4.jpg");
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Upload using Base64 encoding
OpenAI compatible
Python
import os
from openai import OpenAI
import base64
# Encoding function: Converts a local file to a Base64-encoded string.
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image1 = encode_image("football1.jpg")
base64_image2 = encode_image("football2.jpg")
base64_image3 = encode_image("football3.jpg")
base64_image4 = encode_image("football4.jpg")
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus", # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=[
{"role": "user","content": [
{"type": "video","video": [
f"data:image/jpeg;base64,{base64_image1}",
f"data:image/jpeg;base64,{base64_image2}",
f"data:image/jpeg;base64,{base64_image3}",
f"data:image/jpeg;base64,{base64_image4}",]},
{"type": "text","text": "Describe the process shown in this video"},
]}]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")
async function main() {
const completion = await openai.chat.completions.create({
model: "qwen3-vl-plus", // This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages: [
{"role": "user",
"content": [{"type": "video",
// Note: When passing Base64 data, the image format (image/{format}) must match the Content Type in the list of supported images.
// PNG image: data:image/png;base64,${base64Image}
// JPEG image: data:image/jpeg;base64,${base64Image}
// WEBP image: data:image/webp;base64,${base64Image}
"video": [
`data:image/jpeg;base64,${base64Image1}`,
`data:image/jpeg;base64,${base64Image2}`,
`data:image/jpeg;base64,${base64Image3}`,
`data:image/jpeg;base64,${base64Image4}`]},
{"type": "text", "text": "What does this video depict?"}]}]
});
console.log(completion.choices[0].message.content);
}
main();curl
For the method to convert a file to a Base64-encoded string, see the example code.
For demonstration purposes, the Base64 string in the code,
"data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...", is truncated. In actual use, be sure to pass the complete encoded string.
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [{"role": "user",
"content": [{"type": "video",
"video": [
"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...",
"data:image/jpeg;base64,nEpp6jpnP57MoWSyOWwrkXMJhHRCWYeFYb...",
"data:image/jpeg;base64,JHWQnJPc40GwQ7zERAtRMK6iIhnWw4080s...",
"data:image/jpeg;base64,adB6QOU5HP7dAYBBOg/Fb7KIptlbyEOu58..."
]},
{"type": "text",
"text": "Describe the process shown in this video"}]}]
}'DashScope
Python
import base64
import os
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Encoding function: Converts a local file to a Base64-encoded string.
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image1 = encode_image("football1.jpg")
base64_image2 = encode_image("football2.jpg")
base64_image3 = encode_image("football3.jpg")
base64_image4 = encode_image("football4.jpg")
messages = [{'role':'user',
'content': [
{'video':
[f"data:image/png;base64,{base64_image1}",
f"data:image/png;base64,{base64_image2}",
f"data:image/png;base64,{base64_image3}",
f"data:image/png;base64,{base64_image4}"
]
},
{'text': 'Please describe the process shown in this video.'}]}]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen3-vl-plus', # This example uses qwen3-vl-plus. You can replace it with another model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=messages)
print(response.output.choices[0].message.content[0]["text"])Java
import java.io.IOException;
import java.util.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static String encodeImageToBase64(String imagePath) throws IOException {
Path path = Paths.get(imagePath);
byte[] imageBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(imageBytes);
}
public static void videoImageListSample(String localPath1,String localPath2,String localPath3,String localPath4)
throws ApiException, NoApiKeyException, UploadFileException, IOException {
String base64Image1 = encodeImageToBase64(localPath1); // Base64 encoding
String base64Image2 = encodeImageToBase64(localPath2);
String base64Image3 = encodeImageToBase64(localPath3);
String base64Image4 = encodeImageToBase64(localPath4);
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> params = Map.of(
"video", Arrays.asList(
"data:image/jpeg;base64," + base64Image1,
"data:image/jpeg;base64," + base64Image2,
"data:image/jpeg;base64," + base64Image3,
"data:image/jpeg;base64," + base64Image4),
// If the model is from the Qwen2.5-VL series and you pass an image list, you can set the fps parameter. This indicates that the image list is extracted from the original video every 1/fps seconds. This setting has no effect on other models.
"fps",2
);
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(params,
Collections.singletonMap("text", "Describe the process shown in this video")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-vl-plus")
.messages(Arrays.asList(userMessage))
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
// Replace xxx/football1.png and other paths with the absolute paths of your local images
videoImageListSample(
"xxx/football1.jpg",
"xxx/football2.jpg",
"xxx/football3.jpg",
"xxx/football4.jpg"
);
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
For the method to convert a file to a Base64-encoded string, see the example code.
For demonstration purposes, the Base64 string in the code,
"data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...", is truncated. In actual use, be sure to pass the complete encoded string.
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"video": [
"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...",
"data:image/jpeg;base64,nEpp6jpnP57MoWSyOWwrkXMJhHRCWYeFYb...",
"data:image/jpeg;base64,JHWQnJPc40GwQ7zERAtRMK6iIhnWw4080s...",
"data:image/jpeg;base64,adB6QOU5HP7dAYBBOg/Fb7KIptlbyEOu58..."
],
"fps":2
},
{
"text": "Describe the process shown in this video"
}
]
}
]
}
}'More features
Limits
Input file limits
Image file limits
Supported image formats
Image format
Common extensions
MIME Type
BMP
.bmp
image/bmp
JPEG
.jpe, .jpeg, .jpg
image/jpeg
PNG
.png
image/png
TIFF
.tif, .tiff
image/tiff
WEBP
.webp
image/webp
HEIC
.heic
image/heic
Image size: The size of a single image cannot exceed 10 MB. If you pass a Base64-encoded image, ensure that the encoded string is less than 10 MB. For more information, see Upload local files. To learn how to compress the file size, see Image or video compression methods.
Dimensions and aspect ratio: The width and height of the image must each be greater than 10 pixels. The aspect ratio of the image (the ratio of the long edge to the short edge) cannot exceed 200.
Total pixels: The model automatically scales the image, so there is no strict limit on the total number of pixels.
Video file limits
Video size
Internet URL:
Qwen3-VL and qwen-vl-max (including qwen-vl-max-latest and all versions after qwen-vl-max-2025-04-08): Cannot exceed 2 GB.
Other Qwen2.5-VL series and QVQ series models: Cannot exceed 1 GB.
Other models: Cannot exceed 150 MB.
Base64 encoding: The Base64-encoded video must be smaller than 10 MB.
Local file path: The video file must be smaller than 100 MB.
Video duration
Qwen3-VL, qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-2025-08-13, and qwen-vl-max-2025-04-08: 2 seconds to 20 minutes.
Other Qwen2.5-VL series and QVQ models: 2 seconds to 10 minutes.
Other models: 2 seconds to 40 seconds.
Video formats: MP4, AVI, MKV, MOV, FLV, and WMV.
Video dimensions: There are no specific limits. Before processing, the model resizes the video to about 600,000 pixels. Larger video files do not result in better understanding.
Audio understanding: Audio understanding for video files is not supported.
File input methods
Internet URL: Provide a publicly accessible URL for the file. The HTTP and HTTPS protocols are supported.
Base64 encoding: Convert the file to a Base64-encoded string.
Local file path: Provide the path to the local file directly.
Using in a production environment
Image and video pre-processing: Qwen-VL has a size limit for input files. To learn how to compress files, see Image or video compression methods.
Processing text files: Qwen-VL supports only image files and cannot process text files directly. Consider the following alternatives:
Convert the text file to an image format. You can use an image processing library, such as
pdf2imagefor Python, to convert the file into multiple high-quality images page by page. Then, you can pass the images to the model using multi-image input.Qwen-Long can process text files and parse their content.
To convert a text file to an image format, you can use an image editing library (such as Python's
pdf2image) to convert each page of the file into a high-quality image, and then use multi-image input to provide them to the model.
Fault tolerance and stability
Timeout handling: In non-streaming calls, a timeout error occurs if the model fails to complete its output within 180 seconds. When a timeout occurs, the partially generated content is returned in the response body. A response header that contains
x-dashscope-partialresponse: trueindicates a timeout. You can use the partial mode feature, which is supported by some Qwen-VL models, to add the generated content to the messages array and resend the request. This allows the model to continue generating content. For more information, see Continue generation from incomplete output.Retry mechanism: You can design a proper retry logic for API calls, such as exponential backoff. This logic can handle network fluctuations or temporary service unavailability.
Billing and throttling
Throttling: For information about the throttling conditions for the Qwen-VL model, see Throttling.
Free quota (Singapore region only): The Qwen-VL model offers a free quota of 1 million tokens. The 90-day validity period starts when you activate Alibaba Cloud Model Studio or when your model request is approved.
Billing
Total cost = (Number of input tokens × Price per input token) + (Number of output tokens × Price per output token). For input and output prices, see Model List.
In thinking mode, the reasoning process (
reasoning_content) is part of the output. It is counted as output tokens and billed accordingly. If the reasoning process is not included in the output in thinking mode, billing is based on the non-thinking mode price.
View bills: You can view bills or top up your account on the Expenses and Costs page in the Alibaba Cloud Management Console.
API reference
For more information about the input and output parameters of the Qwen-VL model, see Qwen.
FAQ
How do I compress an image or video to the required size?
After the model outputs the object localization results, how do I draw the detection boxes on the original image?
Error codes
If a call fails, see Error messages for troubleshooting.












