Qwen-OCR adalah model Pemahaman visual yang dirancang untuk ekstraksi teks. Model ini mengekstraksi teks dan mengurai data terstruktur dari berbagai citra, seperti dokumen hasil pindaian, tabel, dan struk. Model ini mendukung beberapa bahasa serta mampu menjalankan fungsi lanjutan—termasuk ekstraksi informasi, penguraian tabel, dan pengenalan rumus—dengan menggunakan instruksi tugas tertentu.
Anda dapat mencoba Qwen-OCR secara online di Playground (Singapura atau Beijing).
Contoh
Citra input | Hasil pengenalan |
Kenali beberapa bahasa
|
|
Kenali citra miring
| Perkenalan Produk Filamen serat impor dari Korea Selatan. 6941990612023 No. Item: 2023 |
Temukan posisi teks
pengenalan presisi tinggi mendukung lokalisasi teks. | Visualisasi lokalisasi
Untuk informasi selengkapnya, lihat FAQ tentang cara menggambar Kotak pembatas setiap baris teks ke citra asli. |
Model dan harga
Internasional (Singapura)
Model | Versi | Jendela konteks | Input maks | Output maks | Biaya input | Biaya output | Kuota gratis |
(Token) | (Juta token) | ||||||
qwen-vl-ocr | Stabil | 34.096 | 30.000 Hingga 30.000 per citra | 4096 | $0,72 | $0,72 | 1 juta token masing-masing Berlaku selama 90 hari setelah aktivasi |
qwen-vl-ocr-2025-11-20 Juga qwen-vl-ocr-1120 Berdasarkan Qwen3-VL. Secara signifikan meningkatkan penguraian dokumen dan lokalisasi teks. | Snapshot | 38.192 | 8.192 | $0,07 | $0,16 | ||
China Daratan (Beijing)
Model | Versi | Jendela konteks | Input maks | Output maksimum | Biaya input | Biaya output | Kuota gratis |
(Token) | (Juta token) | ||||||
qwen-vl-ocr Saat ini memiliki kemampuan yang sama dengan qwen-vl-ocr-2025-08-28 | Stabil | 34.096 | 30.000 Hingga 30.000 per citra | 4.096 | $0,717 | $0,717 | Tidak ada kuota gratis |
qwen-vl-ocr-latest Selalu memiliki kemampuan yang sama dengan snapshot terbaru | Terbaru | 38.192 | 8.192 | $0,043 | $0,072 | ||
qwen-vl-ocr-2025-11-20 Juga qwen-vl-ocr-1120 Berdasarkan Qwen3-VL. Secara signifikan meningkatkan penguraian dokumen dan lokalisasi teks. | Snapshot | ||||||
qwen-vl-ocr-2025-08-28 Juga qwen-vl-ocr-0828 | 34.096 | 4.096 | $0,717 | $0,717 | |||
qwen-vl-ocr-2025-04-13 Juga qwen-vl-ocr-0413 | |||||||
qwen-vl-ocr-2024-10-28 Juga qwen-vl-ocr-1028 | |||||||
qwen-vl-ocr, qwen-vl-ocr-2025-04-13, dan qwen-vl-ocr-2025-08-28, parametermax_tokens(panjang output maksimum) secara default bernilai 4.096. Untuk menaikkan nilai ini ke rentang 4.097 hingga 8.192, kirim email ke modelstudio@service.aliyun.com dan sertakan informasi berikut: ID akun Alibaba Cloud Anda, jenis citra (seperti citra dokumen, citra e-commerce, atau kontrak), nama model, perkiraan QPS dan total permintaan harian, serta persentase permintaan di mana output model melebihi 4.096 token.
Persiapan
Buat Kunci API dan ekspor Kunci API sebagai Variabel lingkungan.
Sebelum memanggil model menggunakan SDK OpenAI atau SDK DashScope, instal versi terbaru SDK. Versi minimum yang diperlukan adalah 1.22.2 untuk SDK Python DashScope dan 2.21.8 untuk SDK Java.
SDK DashScope
Keunggulan: Mendukung semua fitur lanjutan, seperti rotasi citra otomatis dan tugas OCR bawaan. Menawarkan fungsionalitas komprehensif dan metode yang lebih sederhana untuk memanggil model.
Skenario: Ideal untuk proyek yang membutuhkan fungsionalitas lengkap.
SDK kompatibel OpenAI
Keunggulan: Nyaman bagi pengguna yang sudah menggunakan SDK OpenAI atau alat ekosistemnya, yang memungkinkan migrasi cepat.
Batasan: Fitur lanjutan (rotasi citra otomatis dan tugas OCR bawaan) tidak dapat dipanggil langsung menggunakan parameter. Anda harus mensimulasikan fitur-fitur ini secara manual dengan menyusun prompt kompleks dan mengurai hasil output sendiri.
Skenario: Ideal untuk proyek yang sudah memiliki integrasi OpenAI dan tidak bergantung pada fitur lanjutan yang eksklusif untuk DashScope.
Memulai
Contoh berikut mengekstraksi informasi utama dari citra tiket kereta (URL) dan mengembalikannya dalam format JSON. Untuk informasi selengkapnya, lihat cara meneruskan file lokal dan batasan citra.
Kompatibel OpenAI
Python
from openai import OpenAI
import os
PROMPT_TICKET_EXTRACTION = """
Please extract the invoice number, train number, departure station, destination station, departure date and time, seat number, seat type, ticket price, ID card number, and passenger name from the train ticket image.
Extract the key information accurately. Do not omit information or fabricate false information. Replace any single character that is blurry or obscured by glare with a question mark (?).
Return the data in JSON format: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Destination Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Type': 'xxx', 'Ticket Price': 'xxx', 'ID Card Number': 'xxx', 'Passenger Name': 'xxx'}
"""
try:
client = OpenAI(
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-vl-ocr-2025-11-20",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192
},
# The model supports passing a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
{"type": "text",
"text": PROMPT_TICKET_EXTRACTION}
]
}
])
print(completion.choices[0].message.content)
except Exception as e:
print(f"Error message: {e}")Node.js
import OpenAI from 'openai';
// Define the prompt to extract train ticket information.
const PROMPT_TICKET_EXTRACTION = `
Please extract the invoice number, train number, departure station, destination station, departure date and time, seat number, seat type, ticket price, ID card number, and passenger name from the train ticket image.
Extract the key information accurately. Do not omit information or fabricate false information. Replace any single character that is blurry or obscured by glare with a question mark (?).
Return the data in JSON format: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Destination Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Type': 'xxx', 'Ticket Price': 'xxx', 'ID Card Number': 'xxx', 'Passenger Name': 'xxx'}
`;
const openai = new OpenAI({
// The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1',
});
async function main() {
const response = await openai.chat.completions.create({
model: 'qwen-vl-ocr-2025-11-20',
messages: [
{
role: 'user',
content: [
// The model supports passing a prompt in the following text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
{ type: 'text', text: PROMPT_TICKET_EXTRACTION},
{
type: 'image_url',
image_url: {
url: 'https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg',
},
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
min_pixels: 32 * 32 * 3,
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
max_pixels: 32 * 32 * 8192
}
]
}
],
});
console.log(response.choices[0].message.content)
}
main();curl
# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before running ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-vl-ocr-2025-11-20",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
"min_pixels": 3072,
"max_pixels": 8388608
},
{"type": "text", "text": "Please extract the invoice number, train number, departure station, destination station, departure date and time, seat number, seat type, ticket price, ID card number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit information or fabricate false information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Destination Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Type': 'xxx', 'Ticket Price': 'xxx', 'ID Card Number': 'xxx', 'Passenger Name': 'xxx'}"}
]
}
]
}'Contoh respons
DashScope
Python
import os
import dashscope
PROMPT_TICKET_EXTRACTION = """
Please extract the invoice number, train number, departure station, destination station, departure date and time, seat number, seat type, ticket price, ID card number, and passenger name from the train ticket image.
Extract the key information accurately. Do not omit information or fabricate false information. Replace any single character that is blurry or obscured by glare with a question mark (?).
Return the data in JSON format: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Destination Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Type': 'xxx', 'Ticket Price': 'xxx', 'ID Card Number': 'xxx', 'Passenger Name': 'xxx'}
"""
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{
"role": "user",
"content": [{
"image": "https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg",
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
# Enables automatic image rotation.
"enable_rotate": False
},
# When no built-in task is set, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
{"type": "text", "text": PROMPT_TICKET_EXTRACTION}]
}]
try:
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages
)
print(response["output"]["choices"][0]["message"].content[0]["text"])
except Exception as e:
print(f"An error occurred: {e}")Java
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg");
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
map.put("min_pixels", 3072);
// Enables automatic image rotation.
map.put("enable_rotate", false);
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map,
// When no built-in task is set, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
Collections.singletonMap("text", "Please extract the invoice number, train number, departure station, destination station, departure date and time, seat number, seat type, ticket price, ID card number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit information or fabricate false information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Destination Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Type': 'xxx', 'Ticket Price': 'xxx', 'ID Card Number': 'xxx', 'Passenger Name': 'xxx'}"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before running ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation'\
--header "Authorization: Bearer $DASHSCOPE_API_KEY"\
--header 'Content-Type: application/json'\
--data '{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
"messages": [
{
"role": "user",
"content": [{
"image": "https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
},
{
"text": "Please extract the invoice number, train number, departure station, destination station, departure date and time, seat number, seat type, ticket price, ID card number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit information or fabricate false information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Destination Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Type': 'xxx', 'Ticket Price': 'xxx', 'ID Card Number': 'xxx', 'Passenger Name': 'xxx'}"
}
]
}
]
}
}'Gunakan tugas bawaan
Untuk menyederhanakan pemanggilan dalam skenario tertentu, model (kecuali qwen-vl-ocr-2024-10-28) mencakup beberapa tugas bawaan.
Cara menggunakan:
SDK DashScope: Anda tidak perlu merancang dan meneruskan
Prompt. Model menggunakanPrompttetap secara internal. Anda dapat mengatur parameterocr_optionsuntuk memanggil tugas bawaan.SDK kompatibel OpenAI: Anda harus memasukkan secara manual
Promptyang ditentukan untuk tugas tersebut.
Tabel berikut mencantumkan nilai task, Prompt yang ditentukan, format output, dan contoh untuk setiap tugas bawaan:
Pengenalan presisi tinggi
Kami menyarankan Anda menggunakan versi model yang lebih baru dari qwen-vl-ocr-2025-08-28 atau versi terbaru untuk memanggil tugas pengenalan presisi tinggi. Tugas ini memiliki atribut berikut:
Mengenali konten teks (mengekstraksi teks)
Mendeteksi posisi teks (melokalisasi baris teks dan mengeluarkan koordinat)
Untuk informasi selengkapnya tentang cara menggambar Kotak pembatas pada citra asli setelah mendapatkan koordinat, lihat FAQ.
Nilai Tugas | Prompt yang ditentukan | Format output dan contoh |
| Locate all text lines and return the coordinates of the rotated rectangle |
|
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{
"role": "user",
"content": [{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
# Enables automatic image rotation.
"enable_rotate": False}]
}]
response = dashscope.MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages,
# Set the built-in task to high-precision recognition.
ocr_options={"task": "advanced_recognition"}
)
# The multi-language recognition task returns the result as plain text.
print(response["output"]["choices"][0]["message"].content[0]["text"])// dashscope SDK version >= 2.21.8
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg");
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
map.put("min_pixels", 3072);
// Enables automatic image rotation.
map.put("enable_rotate", false);
// Configure the built-in OCR task.
OcrOptions ocrOptions = OcrOptions.builder()
.task(OcrOptions.Task.ADVANCED_RECOGNITION)
.build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.ocrOptions(ocrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the Singapore region, replace the base_url with: https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before running ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
}
]
}
]
},
"parameters": {
"ocr_options": {
"task": "advanced_recognition"
}
}
}
'Ekstraksi informasi
Model mendukung ekstraksi informasi terstruktur dari dokumen seperti struk, sertifikat, dan formulir, serta mengembalikan hasilnya dalam format JSON. Anda dapat memilih antara dua mode:
Ekstraksi bidang kustom: Anda dapat menentukan templat JSON kustom (
result_schema) dalam parameterocr_options.task_config. Templat ini menentukan nama bidang spesifik (key) yang akan diekstraksi. Model secara otomatis mengisi nilai yang sesuai (value). Templat mendukung hingga tiga level bersarang.Ekstraksi semua bidang: Jika parameter
result_schematidak ditentukan, model mengekstraksi semua bidang dari citra.
Prompt untuk kedua mode berbeda:
Nilai Tugas | Prompt yang ditentukan | Format output dan contoh |
| Ekstraksi bidang khusus: |
|
Ekstraksi semua bidang: |
|
Berikut adalah contoh kode untuk memanggil model menggunakan SDK DashScope dan HTTP:
# use [pip install -U dashscope] to update sdk
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
"role":"user",
"content":[
{
"image":"http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": False
}
]
}
]
params = {
"ocr_options":{
"task": "key_information_extraction",
"task_config": {
"result_schema": {
"Ride Date": "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05",
"Invoice Code": "Extract the invoice code from the image, usually a combination of numbers or letters",
"Invoice Number": "Extract the number from the invoice, usually composed of only digits."
}
}
}
}
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages,
**params)
print(response.output.choices[0].message.content[0]["ocr_result"])import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.google.gson.JsonObject;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg");
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
map.put("min_pixels", 3072);
// Enables automatic image rotation.
map.put("enable_rotate", false);
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map
)).build();
// Create the main JSON object.
JsonObject resultSchema = new JsonObject();
resultSchema.addProperty("Ride Date", "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05");
resultSchema.addProperty("Invoice Code", "Extract the invoice code from the image, usually a combination of numbers or letters");
resultSchema.addProperty("Invoice Number", "Extract the number from the invoice, usually composed of only digits.");
// Configure the built-in OCR task.
OcrOptions ocrOptions = OcrOptions.builder()
.task(OcrOptions.Task.KEY_INFORMATION_EXTRACTION)
.taskConfig(OcrOptions.TaskConfig.builder()
.resultSchema(resultSchema)
.build())
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.ocrOptions(ocrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("ocr_result"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before running ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"image": "http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
}
]
}
]
},
"parameters": {
"ocr_options": {
"task": "key_information_extraction",
"task_config": {
"result_schema": {
"Ride Date": "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05",
"Invoice Code": "Extract the invoice code from the image, usually a combination of numbers or letters",
"Invoice Number": "Extract the number from the invoice, usually composed of only digits."
}
}
}
}
}
'Jika Anda menggunakan SDK OpenAI atau metode HTTP, Anda harus menambahkan skema JSON kustom ke akhir string prompt. Untuk informasi selengkapnya, lihat contoh kode berikut:
Penguraian tabel
Model mengurai elemen tabel dalam citra dan mengembalikan hasil pengenalan sebagai teks dalam format HTML.
Nilai task | Prompt yang ditentukan | Format output dan contoh |
|
|
|
Berikut adalah contoh kode untuk memanggil model menggunakan SDK DashScope dan HTTP:
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{
"role": "user",
"content": [{
"image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/doc_parsing/tables/photo/eng/17.jpg",
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
# Enables automatic image rotation.
"enable_rotate": False}]
}]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages,
# Set the built-in task to table parsing.
ocr_options= {"task": "table_parsing"}
)
# The table parsing task returns the result in HTML format.
print(response["output"]["choices"][0]["message"].content[0]["text"])import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "https://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/doc_parsing/tables/photo/eng/17.jpg");
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
map.put("min_pixels",3072);
// Enables automatic image rotation.
map.put("enable_rotate", false);
// Configure the built-in OCR task.
OcrOptions ocrOptions = OcrOptions.builder()
.task(OcrOptions.Task.TABLE_PARSING)
.build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.ocrOptions(ocrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before running ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/doc_parsing/tables/photo/eng/17.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
}
]
}
]
},
"parameters": {
"ocr_options": {
"task": "table_parsing"
}
}
}
'Penguraian dokumen
Model dapat mengurai dokumen hasil pindaian atau dokumen PDF yang disimpan sebagai citra. Model dapat mengenali elemen seperti judul, ringkasan, dan label dalam file serta mengembalikan hasil pengenalan sebagai teks dalam format LaTeX.
Nilai task | Prompt yang ditentukan | Format output dan contoh |
|
|
|
Berikut adalah contoh kode untuk memanggil model menggunakan SDK DashScope dan HTTP:
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{
"role": "user",
"content": [{
"image": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg",
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
# Enables automatic image rotation.
"enable_rotate": False}]
}]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages,
# Set the built-in task to document parsing.
ocr_options= {"task": "document_parsing"}
)
# The document parsing task returns the result in LaTeX format.
print(response["output"]["choices"][0]["message"].content[0]["text"])import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg");
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
map.put("min_pixels", 3072);
// Enables automatic image rotation.
map.put("enable_rotate", false);
// Configure the built-in OCR task.
OcrOptions ocrOptions = OcrOptions.builder()
.task(OcrOptions.Task.DOCUMENT_PARSING)
.build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.ocrOptions(ocrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before running ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation'\
--header "Authorization: Bearer $DASHSCOPE_API_KEY"\
--header 'Content-Type: application/json'\
--data '{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
"messages": [
{
"role": "user",
"content": [{
"image": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
}
]
}
]
},
"parameters": {
"ocr_options": {
"task": "document_parsing"
}
}
}
'Pengenalan rumus
Model dapat mengurai rumus dalam citra dan mengembalikan hasil pengenalan sebagai teks dalam format LaTeX.
Nilai task | Prompt yang ditentukan | Format output dan contoh |
|
|
|
Berikut adalah contoh kode untuk memanggil model menggunakan SDK DashScope dan HTTP:
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{
"role": "user",
"content": [{
"image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/formula_handwriting/test/inline_5_4.jpg",
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
# Enables automatic image rotation.
"enable_rotate": False}]
}]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages,
# Set the built-in task to formula recognition.
ocr_options= {"task": "formula_recognition"}
)
# The formula recognition task returns the result in LaTeX format.
print(response["output"]["choices"][0]["message"].content[0]["text"])import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/formula_handwriting/test/inline_5_4.jpg");
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
map.put("min_pixels", 3072);
// Enables automatic image rotation.
map.put("enable_rotate", false);
// Configure the built-in OCR task.
OcrOptions ocrOptions = OcrOptions.builder()
.task(OcrOptions.Task.FORMULA_RECOGNITION)
.build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.ocrOptions(ocrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before running ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
"model": "qwen-vl-ocr",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/formula_handwriting/test/inline_5_4.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
}
]
}
]
},
"parameters": {
"ocr_options": {
"task": "formula_recognition"
}
}
}
'Pengenalan teks umum
Pengenalan teks umum terutama digunakan untuk skenario bahasa Mandarin dan Inggris serta mengembalikan hasil pengenalan dalam format teks biasa.
Nilai Tugas | Prompt yang ditentukan | Format output dan contoh |
|
|
|
Berikut adalah contoh kode untuk melakukan panggilan menggunakan SDK DashScope dan HTTP:
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{
"role": "user",
"content": [{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
# Enables automatic image rotation.
"enable_rotate": False}]
}]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages,
# Set the built-in task to general text recognition.
ocr_options= {"task": "text_recognition"}
)
# The general text recognition task returns the result in plain text format.
print(response["output"]["choices"][0]["message"].content[0]["text"])import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg");
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels exceed min_pixels.
map.put("min_pixels", 3072);
// Enables automatic image rotation.
map.put("enable_rotate", false);
// Configure the built-in task.
OcrOptions ocrOptions = OcrOptions.builder()
.task(OcrOptions.Task.TEXT_RECOGNITION)
.build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.ocrOptions(ocrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# The API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before running ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation'\
--header "Authorization: Bearer $DASHSCOPE_API_KEY"\
--header 'Content-Type: application/json'\
--data '{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
"messages": [
{
"role": "user",
"content": [{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
}
]
}
]
},
"parameters": {
"ocr_options": {
"task": "text_recognition"
}
}
}'Pengenalan multibahasa
Pengenalan multibahasa digunakan untuk skenario yang melibatkan bahasa selain Mandarin dan Inggris. Bahasa yang didukung adalah Arab, Prancis, Jerman, Italia, Jepang, Korea, Portugis, Rusia, Spanyol, dan Vietnam. Hasil pengenalan dikembalikan dalam format teks biasa.
Nilai task | Prompt yang ditentukan | Format output dan contoh |
|
|
|
Berikut adalah contoh kode untuk melakukan panggilan menggunakan SDK DashScope dan HTTP:
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{
"role": "user",
"content": [{
"image": "https://img.alicdn.com/imgextra/i2/O1CN01VvUMNP1yq8YvkSDFY_!!6000000006629-2-tps-6000-3000.png",
# The minimum pixel threshold for the input image. If an image is smaller than this threshold, it is scaled up until its total number of pixels exceeds `min_pixels`.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If an image is larger than this threshold, it is scaled down until its total number of pixels is below `max_pixels`.
"max_pixels": 32 * 32 * 8192,
# Enable the automatic image rotation feature.
"enable_rotate": False}]
}]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen-vl-ocr-2025-11-20',
messages=messages,
# Set the built-in task to multilingual recognition.
ocr_options={"task": "multi_lan"}
)
# The multilingual recognition task returns results in plain text.
print(response["output"]["choices"][0]["message"].content[0]["text"])import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "https://img.alicdn.com/imgextra/i2/O1CN01VvUMNP1yq8YvkSDFY_!!6000000006629-2-tps-6000-3000.png");
// The maximum pixel threshold for the input image. If the image is larger, it is scaled down until its total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller, it is scaled up until its total pixels exceed min_pixels.
map.put("min_pixels", 3072);
// Enable the automatic image rotation feature.
map.put("enable_rotate", false);
// Configure the built-in OCR task.
OcrOptions ocrOptions = OcrOptions.builder()
.task(OcrOptions.Task.MULTI_LAN)
.build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.ocrOptions(ocrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
"messages": [
{
"role": "user",
"content": [
{
"image": "https://img.alicdn.com/imgextra/i2/O1CN01VvUMNP1yq8YvkSDFY_!!6000000006629-2-tps-6000-3000.png",
"min_pixels": 3072,
"max_pixels": 8388608,
"enable_rotate": false
}
]
}
]
},
"parameters": {
"ocr_options": {
"task": "multi_lan"
}
}
}
'Meneruskan file lokal (encoding Base64 atau jalur file)
Model mendukung dua metode untuk mengunggah file lokal:
Unggah langsung menggunakan jalur file (transfer lebih stabil, direkomendasikan)
Unggah menggunakan encoding Base64
Unggah menggunakan jalur file
Anda dapat meneruskan jalur file lokal langsung ke model. Metode ini hanya didukung oleh SDK Python dan Java DashScope. Metode ini tidak didukung oleh metode HTTP DashScope atau metode kompatibel OpenAI.
Untuk informasi selengkapnya tentang cara menentukan jalur file berdasarkan bahasa pemrograman dan sistem operasi Anda, lihat tabel berikut.
Unggah menggunakan encoding Base64
Anda dapat mengonversi file menjadi string yang diencode Base64 lalu meneruskannya ke model. Metode ini berlaku untuk metode OpenAI, SDK DashScope, dan HTTP.
Batasan
Mengunggah menggunakan jalur file direkomendasikan untuk stabilitas yang lebih tinggi. Anda juga dapat menggunakan encoding Base64 untuk file yang ukurannya kurang dari 1 MB.
Saat meneruskan jalur file langsung, ukuran citra individual harus kurang dari 10 MB.
Saat meneruskan file menggunakan encoding Base64, citra yang diencode harus kurang dari 10 MB karena encoding Base64 meningkatkan ukuran data.
Untuk informasi selengkapnya tentang cara memampatkan file, lihat Bagaimana cara memampatkan citra ke ukuran yang diperlukan?
Meneruskan jalur file
Metode ini hanya didukung saat Anda memanggil model menggunakan SDK Python dan Java DashScope. Metode ini tidak didukung oleh metode HTTP DashScope atau metode kompatibel OpenAI.
Python
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace xxx/test.jpg with the absolute path of your local image
local_path = "xxx/test.jpg"
image_path = f"file://{local_path}"
messages = [
{
"role": "user",
"content": [
{
"image": image_path,
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until its total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until its total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
},
# If no built-in task is set for the model, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
{
"text": "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"
},
],
}
]
response = dashscope.MultiModalConversation.call(
# API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-vl-ocr-2025-11-20",
messages=messages,
)
print(response["output"]["choices"][0]["message"].content[0]["text"])
Java
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall(String localPath)
throws ApiException, NoApiKeyException, UploadFileException {
String filePath = "file://"+localPath;
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", filePath);
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until its total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until its total pixels exceed min_pixels.
map.put("min_pixels", 3072);
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map,
// If no built-in task is set for the model, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
Collections.singletonMap("text", "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
// Replace xxx/test.jpg with the absolute path of your local image
simpleMultiModalConversationCall("xxx/test.jpg");
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Meneruskan encoding Base64
Kompatibel OpenAI
Python
from openai import OpenAI
import os
import base64
# Read a local file and encode it in Base64 format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxx/test.png with the absolute path of your local image
base64_image = encode_image("xxx/test.png")
client = OpenAI(
# API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-vl-ocr-2025-11-20",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
# Note that when passing a Base64 string, the image format (image/{format}) must match the Content-Type in the list of supported images. "f" is a string formatting method.
# PNG image: f"data:image/png;base64,{base64_image}"
# JPEG image: f"data:image/jpeg;base64,{base64_image}"
# WEBP image: f"data:image/webp;base64,{base64_image}"
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until its total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until its total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192
},
# The model supports passing a prompt in the following text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
{"type": "text", "text": "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"},
],
}
],
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import {
readFileSync
} from 'fs';
const client = new OpenAI({
// API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});
// Read a local file and encode it in Base64 format
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
// Replace xxx/test.jpg with the absolute path of your local image
const base64Image = encodeImage("xxx/test.jpg")
async function main() {
const completion = await client.chat.completions.create({
model: "qwen-vl-ocr",
messages: [{
"role": "user",
"content": [{
"type": "image_url",
"image_url": {
// Note that when passing a Base64 string, the image format (image/{format}) must match the Content-Type in the list of supported images.
// PNG image: data:image/png;base64,${base64Image}
// JPEG image: data:image/jpeg;base64,${base64Image}
// WEBP image: data:image/webp;base64,${base64Image}
"url": `data:image/jpeg;base64,${base64Image}`
},
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until its total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until its total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192
},
// The model supports passing a prompt in the following text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
{
"type": "text",
"text": "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"
}
]
}]
});
console.log(completion.choices[0].message.content);
}
main();curl
Untuk informasi tentang metode mengonversi file ke string yang diencode Base64, lihat kode contoh.
Untuk tujuan demonstrasi, string yang diencode Base64
"data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."dalam kode dipotong. Dalam penggunaan aktual, Anda harus meneruskan string yang diencode lengkap.
# ======= Important =======
# API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-vl-ocr-latest",
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."}},
{"type": "text", "text": "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"}
]
}]
}'DashScope
Python
import os
import base64
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
# Replace xxx/test.jpg with the absolute path of your local image
base64_image = encode_image("xxx/test.jpg")
messages = [
{
"role": "user",
"content": [
{
# Note that when passing a Base64 string, the image format (image/{format}) must match the Content-Type in the list of supported images. "f" is a string formatting method.
# PNG image: f"data:image/png;base64,{base64_image}"
# JPEG image: f"data:image/jpeg;base64,{base64_image}"
# WEBP image: f"data:image/webp;base64,{base64_image}"
"image": f"data:image/jpeg;base64,{base64_image}",
# The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until its total pixels exceed min_pixels.
"min_pixels": 32 * 32 * 3,
# The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until its total pixels are below max_pixels.
"max_pixels": 32 * 32 * 8192,
},
# If no built-in task is set for the model, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
{
"text": "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"
},
],
}
]
response = dashscope.MultiModalConversation.call(
# API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-vl-ocr-2025-11-20",
messages=messages,
)
print(response["output"]["choices"][0]["message"].content[0]["text"])Java
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
// Base64 encoding format
private static String encodeImageToBase64(String imagePath) throws IOException {
Path path = Paths.get(imagePath);
byte[] imageBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(imageBytes);
}
public static void simpleMultiModalConversationCall(String localPath)
throws ApiException, NoApiKeyException, UploadFileException, IOException {
String base64Image = encodeImageToBase64(localPath); // Base64 encoding
MultiModalConversation conv = new MultiModalConversation();
Map<String, Object> map = new HashMap<>();
map.put("image", "data:image/jpeg;base64," + base64Image);
// The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until its total pixels are below max_pixels.
map.put("max_pixels", 8388608);
// The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until its total pixels exceed min_pixels.
map.put("min_pixels", 3072);
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
map,
// If no built-in task is set for the model, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
Collections.singletonMap("text", "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"))).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen-vl-ocr-2025-11-20")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
// Replace xxx/test.jpg with the absolute path of your local image
simpleMultiModalConversationCall("xxx/test.jpg");
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
Untuk informasi tentang metode mengonversi file ke string yang diencode Base64, lihat kode contoh.
Untuk tujuan demonstrasi, string yang diencode Base64
"data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."dalam kode dipotong. Dalam penggunaan aktual, Anda harus meneruskan string yang diencode lengkap.
# ======= Important =======
# API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-vl-ocr-latest",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."},
{"text": "Extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID number, and passenger name from the train ticket image. Extract the key information accurately. Do not omit or fabricate information. Replace any single character that is blurry or obscured by glare with a question mark (?). Return the data in JSON format as follows: {'Invoice Number': 'xxx', 'Train Number': 'xxx', 'Departure Station': 'xxx', 'Arrival Station': 'xxx', 'Departure Date and Time': 'xxx', 'Seat Number': 'xxx', 'Seat Class': 'xxx', 'Ticket Price': 'xxx', 'ID Number': 'xxx', 'Passenger Name': 'xxx'}"}
]
}
]
}
}'Penggunaan lainnya
Batasan
Batasan citra
Ukuran file: Ukuran file citra tunggal tidak boleh melebihi 10 MB. Untuk file yang diencode Base64, ukuran string yang diencode tidak boleh melebihi 10 MB. Untuk informasi selengkapnya, lihat Meneruskan file lokal.
Dimensi dan rasio aspek: Lebar dan tinggi citra harus keduanya lebih besar dari 10 piksel. Rasio aspek tidak boleh melebihi 200:1 atau 1:200.
Total piksel: Model secara otomatis menskalakan citra, sehingga tidak ada batasan ketat pada jumlah total piksel. Namun, citra sebaiknya tidak melebihi 15,68 juta piksel.
Format citra
Ekstensi umum
Jenis MIME
BMP
.bmp
image/bmp
JPEG
.jpe, .jpeg, .jpg
image/jpeg
PNG
.png
image/png
TIFF
.tif, .tiff
image/tiff
WEBP
.webp
image/webp
HEIC
.heic
image/heic
Batasan model
Pesan sistem: Model ini tidak mendukung
Pesan Sistemkustom karena menggunakanPesan Sisteminternal tetap. Anda harus meneruskan semua instruksi melaluiPesan Pengguna.Tidak ada percakapan multi-putaran: Model ini tidak mendukung percakapan multi-putaran dan hanya menjawab pertanyaan terbaru.
Risiko halusinasi: Model mungkin mengalami halusinasi jika teks dalam citra terlalu kecil atau memiliki resolusi rendah. Selain itu, akurasi jawaban untuk pertanyaan yang tidak terkait dengan ekstraksi teks tidak dijamin.
Tidak dapat memproses file teks:
File yang berisi data citra harus diubah menjadi urutan citra sebelum diproses. Untuk informasi selengkapnya, lihat rekomendasi dalam Mulai produksi.
Untuk file dengan teks biasa atau data terstruktur, gunakan Qwen-Long, yang dapat mengurai teks panjang.
Penagihan dan pembatasan laju
Penagihan: Qwen-OCR adalah model multimodal. Total biaya dihitung sebagai berikut: (Jumlah token input × Harga satuan input) + (Jumlah token output × Harga satuan output). Untuk informasi tentang cara menghitung token citra, lihat Metode konversi token citra. Anda dapat melihat tagihan atau mengisi ulang akun Anda di halaman Biaya dan Pengeluaran di Konsol Manajemen Alibaba Cloud.
Pembatasan laju: Untuk informasi tentang batas laju Qwen-OCR, lihat Batas laju.
Kuota gratis (hanya wilayah Singapura): Qwen-OCR menyediakan kuota gratis sebesar 1 juta token. Kuota ini berlaku selama 90 hari sejak tanggal Anda mengaktifkan Alibaba Cloud Model Studio atau tanggal permintaan Anda untuk menggunakan model disetujui.
Mulai produksi
Memproses dokumen multi-halaman, seperti PDF:
Pisahkan: Gunakan pustaka pengeditan citra, seperti
Python'spdf2image, untuk mengonversi setiap halaman file PDF menjadi citra berkualitas tinggi.Kirim permintaan: Gunakan metode input multi-citra untuk pengenalan.
Pra-pemrosesan citra:
Pastikan citra input jelas, pencahayaan merata, dan tidak terlalu terkompresi:
Untuk mencegah kehilangan informasi, gunakan format lossless, seperti PNG, untuk penyimpanan dan transmisi citra.
Untuk meningkatkan definisi citra, gunakan algoritma pengurangan kebisingan, seperti filter mean atau median, untuk Penghalusan citra yang mengandung noise.
Untuk mengoreksi pencahayaan tidak merata, gunakan algoritma seperti Histogram equalization adaptif untuk menyesuaikan kecerahan dan kontras.
Untuk citra miring: Gunakan parameter
enable_rotate: trueSDK DashScope untuk meningkatkan kinerja pengenalan secara signifikan.Untuk citra yang sangat kecil atau sangat besar: Gunakan parameter
min_pixelsdanmax_pixelsuntuk mengontrol perilaku penskalaan sebelum pengeditan citra.min_pixels: Memperbesar citra kecil untuk membantu mendeteksi detail. Kami menyarankan Anda mempertahankan nilai default.max_pixels: Mencegah citra yang sangat besar mengonsumsi sumber daya berlebihan. Nilai default cocok untuk sebagian besar skenario. Jika teks kecil tidak terdeteksi dengan jelas, Anda dapat menaikkan nilaimax_pixels. Perhatikan bahwa hal ini meningkatkan konsumsi token.
Validasi hasil: Hasil pengenalan model mungkin mengandung kesalahan. Untuk operasi bisnis kritis, Anda dapat menerapkan proses tinjauan manual atau menambahkan aturan validasi untuk memverifikasi akurasi output model. Misalnya, gunakan validasi format untuk nomor KTP dan nomor kartu bank.
Panggilan batch: Untuk skenario berskala besar yang tidak real-time, gunakan API Batch untuk memproses pekerjaan batch secara asinkron dengan biaya lebih rendah.
FAQ
Bagaimana cara menggambar kotak deteksi pada citra asli setelah model mengeluarkan hasil lokalisasi teks?
Referensi API
Untuk parameter input dan output Qwen-OCR, lihat Referensi API Qwen-OCR.
Kode kesalahan
Jika panggilan gagal, lihat Pesan kesalahan untuk troubleshooting.









