All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen-OCR API reference

Last Updated:Jan 14, 2026

This topic describes the input and output parameters for calling the Qwen-OCR model using the OpenAI-compatible API or the DashScope API.

Reference: Text extraction (Qwen-OCR)

OpenAI compatibility

Singapore region

For SDK calls, set base_url to https://dashscope-intl.aliyuncs.com/compatible-mode/v1.

For HTTP calls, the endpoint is POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions.

US (Virginia) region

For SDK calls, set base_url to https://dashscope-us.aliyuncs.com/compatible-mode/v1.

For HTTP calls, the endpoint is POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions.

China (Beijing) region

For SDK calls, set base_url to https://dashscope.aliyuncs.com/compatible-mode/v1.

For HTTP calls, the endpoint is POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions.

Get an API key and set it as an environment variable. If you use the OpenAI SDK, you must also install the SDK.

Request body

Non-streaming output

Python

from openai import OpenAI
import os

PROMPT_TICKET_EXTRACTION = """
Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image.
You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?).
Return the data in JSON format as follows: {'invoice_number': 'xxx', 'departure_station': 'xxx', 'arrival_station': 'xxx', 'departure_date_and_time':'xxx', 'seat_number': 'xxx','ticket_price':'xxx', 'id_card_number': 'xxx', 'passenger_name': 'xxx'},
"""

try:
    client = OpenAI(
        # API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
        # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/compatible-mode/v1
        # If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    completion = client.chat.completions.create(
        model="qwen-vl-ocr-2025-11-20",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
                        # The minimum pixel threshold for the input image. If the image has fewer pixels than this value, it is enlarged until its total pixel count exceeds min_pixels.
                        "min_pixels": 32 * 32 * 3,
                        # The maximum pixel threshold for the input image. If the image has more pixels than this value, it is reduced until its total pixel count is below max_pixels.
                        "max_pixels": 32 * 32 * 8192
                    },
                    # The model supports passing a prompt in the following text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
                    {"type": "text",
                     "text": PROMPT_TICKET_EXTRACTION}
                ]
            }
        ])
    print(completion.choices[0].message.content)
except Exception as e:
    print(f"Error message: {e}")

Node.js

import OpenAI from 'openai';

// Define the prompt for extracting train ticket information.
const PROMPT_TICKET_EXTRACTION = `
Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image.
You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?).
Return the data in JSON format as follows: {'invoice_number': 'xxx', 'departure_station': 'xxx', 'arrival_station': 'xxx', 'departure_date_and_time':'xxx', 'seat_number': 'xxx','ticket_price':'xxx', 'id_card_number': 'xxx', 'passenger_name': 'xxx'}
`;

const client = new OpenAI({
  // API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
  // If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1',
});

async function main() {
  const response = await client.chat.completions.create({
    model: 'qwen-vl-ocr-2025-11-20',
    messages: [
      {
        role: 'user',
        content: [
          // The model supports passing a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
          { type: 'text', text: PROMPT_TICKET_EXTRACTION},
          {
            type: 'image_url',
            image_url: {
              url: 'https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg',
            },
              //  The minimum pixel threshold for the input image. If the image has fewer pixels than this value, it is enlarged until its total pixel count exceeds min_pixels.
              "min_pixels": 32 * 32 * 3,
              // The maximum pixel threshold for the input image. If the image has more pixels than this value, it is reduced until its total pixel count is below max_pixels.
              "max_pixels": 32 * 32 * 8192
          }
        ]
      }
    ],
  });
  console.log(response.choices[0].message.content)
}

main();

curl

# ======= Important =======
# API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-vl-ocr-2025-11-20",
  "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
                    "min_pixels": 3072,
                    "max_pixels": 8388608
                },
                {"type": "text", "text": "Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image. You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?). Return the data in JSON format as follows: {\'invoice_number\': \'xxx\', \'departure_station\': \'xxx\', \'arrival_station\': \'xxx\', \'departure_date_and_time\':\'xxx\', \'seat_number\': \'xxx\',\'ticket_price\':\'xxx\', \'id_card_number\': \'xxx\', \'passenger_name\': \'xxx\'}"}
            ]
        }
    ]
}'

Streaming output

Python

import os
from openai import OpenAI

PROMPT_TICKET_EXTRACTION = """
Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image.
You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?).
Return the data in JSON format as follows: {'invoice_number': 'xxx','departure_station': 'xxx', 'arrival_station': 'xxx', 'departure_date_and_time':'xxx', 'seat_number': 'xxx','ticket_price':'xxx', 'id_card_number': 'xxx', 'passenger_name': 'xxx'},
"""

client = OpenAI(
    # API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
    # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-vl-ocr-2025-11-20",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
                    # The minimum pixel threshold for the input image. If the image has fewer pixels than this value, it is enlarged until its total pixel count exceeds min_pixels.
                    "min_pixels": 32 * 32 * 3,
                    # The maximum pixel threshold for the input image. If the image has more pixels than this value, it is reduced until its total pixel count is below max_pixels.
                    "max_pixels": 32 * 32 * 8192
                },
                  # The model supports passing a prompt in the following text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
                {"type": "text","text": PROMPT_TICKET_EXTRACTION}
            ]
        }
    ],
    stream=True,
    stream_options={"include_usage": True}
)

for chunk in completion:
    print(chunk.model_dump_json())

Node.js

import OpenAI from 'openai';


// Define the prompt for extracting train ticket information.
const PROMPT_TICKET_EXTRACTION = `
Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image.
You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?).
Return the data in JSON format as follows: {'invoice_number': 'xxx', 'departure_station': 'xxx', 'arrival_station': 'xxx', 'departure_date_and_time':'xxx', 'seat_number': 'xxx','ticket_price':'xxx', 'id_card_number': 'xxx', 'passenger_name': 'xxx'}
`;

const openai = new OpenAI({
  // API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
  // If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1',
});

async function main() {
  const response = await openai.chat.completions.create({
    model: 'qwen-vl-ocr-2025-11-20',
    messages: [
      {
        role: 'user',
        content: [
          // The model supports passing a prompt in the following text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
          { type: 'text', text: PROMPT_TICKET_EXTRACTION},
          {
            type: 'image_url',
            image_url: {
              url: 'https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg',
            },
              //  The minimum pixel threshold for the input image. If the image has fewer pixels than this value, it is enlarged until its total pixel count exceeds min_pixels.
              "min_pixels": 32 * 32 * 3,
              // The maximum pixel threshold for the input image. If the image has more pixels than this value, it is reduced until its total pixel count is below max_pixels.
              "max_pixels": 32 * 32 * 8192
          }
        ]
      }
    ],
    stream: true,
    stream_options:{"include_usage": true}
  });
let fullContent = ""
  console.log("Streaming output content:")
  for await (const chunk of response) {
    if (chunk.choices[0] && chunk.choices[0].delta.content != null) {
      fullContent += chunk.choices[0].delta.content;
      console.log(chunk.choices[0].delta.content);
    }
}
  console.log(`Full output content: ${fullContent}`)
}

main();

curl

# ======= Important =======
# API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-vl-ocr-2025-11-20",
  "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url":"https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg"},
                    "min_pixels": 3072,
                    "max_pixels": 8388608
                },
                {"type": "text", "text": "Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image. You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?). Return the data in JSON format as follows: {\'invoice_number\': \'xxx\', \'departure_station\': \'xxx\', \'arrival_station\': \'xxx\', \'departure_date_and_time\':\'xxx\', \'seat_number\': \'xxx\',\'ticket_price\':\'xxx\', \'id_card_number\': \'xxx\', \'passenger_name\': \'xxx\'}"}
            ]
        }
    ],
    "stream": true,
    "stream_options": {"include_usage": true}
}'

model string (Required)

The name of the model. See Qwen-OCR for a list of supported models.

messages array (Required)

A sequence of messages that provides context to the model in conversational order.

Message types

User Message object (Required)

A user message that provides instructions and an image for the model to process.

Properties

content array (Required)

The content of the message.

Properties

type string (Required)

Valid values:

  • text

    Set the type to text for text input.

  • image_url

    Use image_url to specify the input image.

text string (Optional)

The input text.

The default value is Please output only the text content from the image without any additional descriptions or formatting. This default behavior instructs the model to extract all text from the image.

image_url object

Information about the input image. This parameter is required if the type is set to image_url.

Properties

url string (Required)

The URL or Base64-encoded Data URL of the image. For more information about passing a local file, see Text extraction.

min_pixels integer (Optional)

The minimum pixel threshold for the input image in pixels.

If an input image has a pixel count below min_pixels, it is enlarged until its total pixel count exceeds min_pixels.

Conversion between image tokens and pixels

The number of pixels per image token varies by model:

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20: Each token corresponds to 32×32 pixels.

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models: Each token corresponds to 28×28 pixels.

Value range for min_pixels

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20: The default and minimum value is 3072 (that is, 3×32×32).

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models: The default and minimum value is 3136 (that is, 4×28×28).

Example: {"type": "image_url","image_url": {"url":"https://xxxx.jpg"},"min_pixels": 3072}

max_pixels integer (Optional)

The maximum pixel threshold for the input image in pixels.

If the pixel count of the input image is within the [min_pixels, max_pixels] range, the model processes the original image without resizing. If the pixel count of the input image exceeds max_pixels, the image is scaled down until its pixel count is less than max_pixels.

Conversion between image tokens and pixels

The number of pixels per image token varies by model:

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20: Each token corresponds to 32×32 pixels.

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models: Each token corresponds to 28×28 pixels.

Value range for max_pixels

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20

    • Default value: 8388608 (that is, 8192×32×32)

    • Maximum value: 30720000 (that is, 30000×32×32)

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models

    • Default value: 6422528 (that is, 8192×28×28)

    • Maximum value: 23520000 (that is, 30000×28×28)

Example: {"type": "image_url","image_url": {"url":"https://xxxx.jpg"},"max_pixels": 8388608}

role string (Required)

The role of the user message. The value must be user.

stream boolean (Optional) Default: false

Specifies whether to return the response in streaming mode.

Valid values:

  • false: Returns the complete response at once after the model finishes generation.

  • true: Returns data blocks as the model generates them. The client must read the blocks sequentially to reconstruct the complete response.

stream_options object (Optional)

The configuration settings for streaming output. This parameter applies only when the stream parameter is set to true.

Properties

include_usage boolean (Optional) Default: false

Specifies whether to include token usage information in the last data block of the stream.

Valid values:

  • true

  • false

max_tokens integer (Optional)

The maximum number of tokens to generate in the output. If the generated content exceeds this value, the response is truncated.

  • For qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20, and qwen-vl-ocr-2024-10-28, the default and maximum values are the same as the model's maximum output length. For more information, see Availability.

  • For qwen-vl-ocr, qwen-vl-ocr-2025-04-13, and qwen-vl-ocr-2025-08-28, the default and maximum values are 4096.

    To increase this parameter's value to a number between 4097 and 8192, send an email to modelstudio@service.aliyun.com. Your email must include the following information: your Alibaba Cloud account ID, the image type (such as document, e-commerce, or contract), the model name, your estimated queries per second (QPS) and total daily requests, and the percentage of requests where the model output exceeds 4096 tokens.

logprobs boolean (Optional) Default: false

Specifies whether to return the log probabilities of the output tokens. Valid values:

  • true

  • false

top_logprobs integer (Optional) Default: 0

Specifies the number of most likely tokens to return at each generation step.

Value range: [0, 5]

This parameter takes effect only when logprobs is true.

temperature float (Optional) Default: 0.01

The sampling temperature controls the diversity of the text generated by the model.

A higher temperature results in more diverse text, while a lower temperature results in more deterministic text.

Value range: [0, 2)

Because both temperature and top_p control the diversity of the generated text, you can set only one of them.

We recommend that you use the default value.

top_p float (Optional) Default: 0.001

This parameter is the probability threshold for nucleus sampling, which controls the diversity of the text that the model generates.

A higher value results in more diverse text. A lower value results in more deterministic text.

Value range: (0, 1.0]

Because both temperature and top_p control text diversity, you should set only one of them.

We recommend that you use the default value.

top_k integer (Optional) Default: 1

Specifies the size of the candidate set for sampling during generation. For example, if you set the value to 50, only the 50 tokens with the highest scores are used as the candidate set for random sampling. A larger value increases randomness, while a smaller value increases determinism. If the value is None or greater than 100, the top_k policy is not enabled. In this case, only the top_p policy takes effect.

The value must be greater than or equal to 0.

This parameter is not a standard OpenAI parameter. When using the Python SDK, place this parameter in the extra_body object. For example: extra_body={"top_k": xxx}. When using the Node.js SDK or HTTP, pass this parameter at the top level.

We recommend that you use the default value.

repetition_penalty float (Optional) Default: 1.0

The penalty for repeated sequences during model generation. A higher value can reduce repetition in the generated text. A value of 1.0 means no penalty is applied.

We recommend using the default value.

presence_penalty float (Optional) Default: 0.0

Controls the repetition of content in the text generated by the model.

The value must be within the range of -2.0 to 2.0. A positive value reduces repetition, and a negative value increases it.

Increase this value for scenarios that require diversity, creativity, or brainstorming, such as creative writing. Decrease this value for scenarios that emphasize consistency and terminological accuracy, such as technical documents or formal texts.

How it works

If the value of this parameter is positive, the model applies a penalty to tokens that already exist in the text. The penalty is applied regardless of the number of times a token appears. This reduces the likelihood of these tokens reappearing, which decreases content repetition and increases word diversity.

We recommend that you use the default value.

seed integer (Optional)

A random number seed. Using a seed ensures reproducible results. If you pass the same seed value in a call and keep other parameters unchanged, the model returns a deterministic result.

Value range: [0,2<sup>31</sup>−1].

We recommend that you use the default value.

stop string or array (Optional)

Specifies the stop words. When a string or token_id specified in stop appears in the generated text, generation stops immediately.

You can use this parameter to specify sensitive words and control the model's output.

If stop is an array, you cannot mix token_ids and strings as elements. For example, you cannot specify ["Hello",104307].

Chat response object (non-streaming output)

{
  "id": "chatcmpl-ba21fa91-dcd6-4dad-90cc-6d49c3c39094",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "```json\n{\n    \"seller_name\": \"null\",\n    \"buyer_name\": \"Cai Yingshi\",\n    \"price_excluding_tax\": \"230769.23\",\n    \"organization_code\": \"null\",\n    \"invoice_code\": \"142011726001\"\n}\n```",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1763283287,
  "model": "qwen-vl-ocr-latest",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 72,
    "prompt_tokens": 1185,
    "total_tokens": 1257,
    "completion_tokens_details": {
      "accepted_prediction_tokens": null,
      "audio_tokens": null,
      "reasoning_tokens": null,
      "rejected_prediction_tokens": null,
      "text_tokens": 72
    },
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": null,
      "image_tokens": 1001,
      "text_tokens": 184
    }
  }
}

id string

The unique identifier for this request.

choices array

An array of generated content from the model.

Properties

finish_reason string

The reason the model stopped generating.

Consider the following two situations:

  • When the output is complete, the value is stop.

  • The generation is terminated because the output is too long, and the stop reason is length.

index integer

The index of the current object in the choices array.

message object

The message generated by the model.

Properties

content string

The content returned by the Large Language Model (LLM).

refusal string

This parameter is always null.

role string

The role of the message. The value is always assistant.

audio object

This parameter is always null.

function_call object

This parameter is always null.

tool_calls array

This parameter is always null.

created integer

The UNIX timestamp when this request was created.

model string

The model used for this request.

object string

The value is always chat.completion.

service_tier string

This parameter is always null.

system_fingerprint string

This parameter is always null.

usage object

Token usage information for this request.

Properties

completion_tokens integer

The number of tokens in the model's output.

prompt_tokens integer

The number of tokens in the input.

total_tokens integer

The total number of tokens consumed. This is the sum of prompt_tokens and completion_tokens.

completion_tokens_details object

A fine-grained breakdown of the model's output tokens.

Properties

accepted_prediction_tokens integer

This parameter is always null.

audio_tokens integer

This parameter is always null.

reasoning_tokens integer

This parameter is always null.

text_tokens integer

The number of tokens corresponding to the model's text output.

rejected_prediction_tokens integer

This parameter is always null.

prompt_tokens_details object

A fine-grained breakdown of the input tokens.

Properties

audio_tokens integer

This parameter is always null.

cached_tokens integer

This parameter is always null.

text_tokens integer

The number of tokens corresponding to the model's text input.

image_tokens integer

The number of tokens corresponding to the model's image input.

Chat response chunk object (streaming output)

{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"","function_call":null,"refusal":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"```","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"json","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"\n","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"{\n","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"   ","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
......
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"```","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[{"delta":{"content":"","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-f6fbdc0d-78d6-418f-856f-f099c2e4859b","choices":[],"created":1764139204,"model":"qwen-vl-ocr-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":141,"prompt_tokens":513,"total_tokens":654,"completion_tokens_details":{"accepted_prediction_tokens":null,"audio_tokens":null,"reasoning_tokens":null,"rejected_prediction_tokens":null,"text_tokens":141},"prompt_tokens_details":{"audio_tokens":null,"cached_tokens":null,"image_tokens":332,"text_tokens":181}}}

id string

The unique identifier for this call. Each chunk object has the same id.

choices array

An array of generated content from the model. This field is empty in the last chunk if the include_usage parameter is set to true.

Properties

delta object

The output content returned in streaming mode.

Properties

content string

The content returned by the model.

function_call object

This parameter is currently set to null.

refusal object

This parameter is currently set to null.

role string

The role of the message object. This parameter is returned only in the first chunk.

finish_reason string

The model stops generating for one of the following three reasons:

  • The value is stop when the output is complete.

  • The value is null during generation.

  • length: The output was truncated because it reached the maximum length.

index integer

The index of the current response in the choices array.

created integer

The UNIX timestamp when this request was created. Each chunk has the same timestamp.

model string

The model used for this request.

object string

The value is always chat.completion.chunk.

service_tier string

This parameter is currently set to null.

system_fingerprint string

This parameter is currently set to null.

usage object

The token usage statistics for the request. This object is returned only in the last chunk when the include_usage parameter is set to true.

Properties

completion_tokens integer

The number of tokens in the generated output.

prompt_tokens integer

The number of tokens in the input prompt.

total_tokens integer

The total number of tokens used in the request. This is the sum of prompt_tokens and completion_tokens.

completion_tokens_details object

A detailed breakdown of the tokens in the model's output.

Properties

accepted_prediction_tokens integer

This parameter is currently set to null.

audio_tokens integer

This parameter is currently set to null.

reasoning_tokens integer

This parameter is currently set to null.

text_tokens integer

The number of tokens corresponding to the model's text output.

rejected_prediction_tokens integer

This parameter is currently set to null.

prompt_tokens_details object

A detailed breakdown of the tokens in the input.

Properties

audio_tokens integer

This parameter is currently set to null.

cached_tokens integer

This parameter is currently set to null.

text_tokens integer

The number of tokens corresponding to the model's text input.

image_tokens integer

The number of tokens corresponding to the model's image input.

DashScope

Singapore region

For HTTP calls, the endpoint is: POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

For SDK calls, set base_url to:

Python code

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

Java code

  • Method 1:

    import com.alibaba.dashscope.protocol.Protocol;
    MultiModalConversation conv = new MultiModalConversation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
  • Method 2:

    import com.alibaba.dashscope.utils.Constants;
    Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";

US (Virginia) region

For HTTP calls, the endpoint is: POST https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

For SDK calls, set base_url to:

Python code

dashscope.base_http_api_url = 'https://dashscope-us.aliyuncs.com/api/v1'

Java code

  • Method 1:

    import com.alibaba.dashscope.protocol.Protocol;
    MultiModalConversation conv = new MultiModalConversation(Protocol.HTTP.getValue(), "https://dashscope-us.aliyuncs.com/api/v1");
  • Method 2:

    import com.alibaba.dashscope.utils.Constants;
    Constants.baseHttpApiUrl="https://dashscope-us.aliyuncs.com/api/v1";

China (Beijing) region

For HTTP calls, the endpoint is: POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

For SDK calls, you do not need to set base_url.

Get an API key and set it as an environment variable. If you use the DashScope SDK, you must also install the DashScope SDK.

Request body

High-precision recognition

The following code provides an example of how to call the built-in high-precision recognition task. For more information, see Call a built-in task.

import os
import dashscope

# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [{
            "role": "user",
            "content": [{
                "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
                # The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
                "min_pixels": 32 * 32 * 3,
                # The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
                "max_pixels": 32 * 32 * 8192,
                # Specifies whether to enable automatic image rotation.
                "enable_rotate": False}]
            }]
            
response = dashscope.MultiModalConversation.call(
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-ocr-2025-11-20',
    messages=messages,
    # Set the built-in task to high-precision recognition.
    ocr_options={"task": "advanced_recognition"}
)
# The high-precision recognition task returns the result as plain text.
print(response["output"]["choices"][0]["message"].content[0]["text"])
// dashscope SDK version >= 2.21.8
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {
        // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
        // If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg");
        // The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        map.put("min_pixels", 3072);
        // Specifies whether to enable automatic image rotation.
        map.put("enable_rotate", false);
        
        // Configure the built-in OCR task.
        OcrOptions ocrOptions = OcrOptions.builder()
                .task(OcrOptions.Task.ADVANCED_RECOGNITION)
                .build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map
                        )).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .ocrOptions(ocrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# === Delete this comment before running ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
  "model": "qwen-vl-ocr-2025-11-20",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
            "min_pixels": 3072,
            "max_pixels": 8388608,
            "enable_rotate": false
          }
        ]
      }
    ]
  },
  "parameters": {
    "ocr_options": {
      "task": "advanced_recognition"
    }
  }
}
'

Information extraction

The following code provides an example of how to call the built-in information extraction task. For more information, see Call a built-in task.

# use [pip install -U dashscope] to update sdk

import os
import dashscope
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
      {
        "role":"user",
        "content":[
          {
              "image":"http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg",
              "min_pixels": 3072,
              "max_pixels": 8388608,
              "enable_rotate": False
          }
        ]
      }
    ]

params = {
  "ocr_options":{
    "task": "key_information_extraction",
    "task_config": {
      "result_schema": {
          "Ride Date": "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05",
          "Invoice Code": "Extract the invoice code from the image, usually a combination of numbers or letters",
          "Invoice Number": "Extract the number from the invoice, usually composed of only digits."
      }
    }
  }
}

response = dashscope.MultiModalConversation.call(
    # API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-ocr-2025-11-20',
    messages=messages,
    **params)

print(response.output.choices[0].message.content[0]["ocr_result"])
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.google.gson.JsonObject;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {
        // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
        // If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg");
        // The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        map.put("min_pixels", 3072);
         // Specifies whether to enable automatic image rotation.
        map.put("enable_rotate", false);
        
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map
                        )).build();

        // Create the main JSON object.
        JsonObject resultSchema = new JsonObject();
        resultSchema.addProperty("Ride Date", "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05");
        resultSchema.addProperty("Invoice Code", "Extract the invoice code from the image, usually a combination of numbers or letters");
        resultSchema.addProperty("Invoice Number", "Extract the number from the invoice, usually composed of only digits.");


        // Configure the built-in OCR task.
        OcrOptions ocrOptions = OcrOptions.builder()
                .task(OcrOptions.Task.KEY_INFORMATION_EXTRACTION)
                .taskConfig(OcrOptions.TaskConfig.builder()
                        .resultSchema(resultSchema)
                        .build())
                .build();

        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .ocrOptions(ocrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("ocr_result"));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# === Delete this comment before running ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
  "model": "qwen-vl-ocr-2025-11-20",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "image": "http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg",
            "min_pixels": 3072,
            "max_pixels": 8388608,
            "enable_rotate": false
          }
        ]
      }
    ]
  },
  "parameters": {
    "ocr_options": {
      "task": "key_information_extraction",
      "task_config": {
        "result_schema": {
            "Ride Date": "Corresponds to the ride date and time in the image, in the format YYYY-MM-DD, for example, 2025-03-05",
            "Invoice Code": "Extract the invoice code from the image, usually a combination of numbers or letters",
            "Invoice Number": "Extract the number from the invoice, usually composed of only digits."
        }
    }
    }
  }
}
'

Table parsing

The following code provides an example of how to call the built-in table parsing task. For more information, see Call a built-in task.

import os
import dashscope
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [{
            "role": "user",
            "content": [{
                "image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/doc_parsing/tables/photo/eng/17.jpg",
                # The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
                "min_pixels": 32 * 32 * 3,
                # The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
                "max_pixels": 32 * 32 * 8192,
                # Specifies whether to enable automatic image rotation.
                "enable_rotate": False}]
           }]
           
response = dashscope.MultiModalConversation.call(
    # API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-ocr-2025-11-20',
    messages=messages,
    # Set the built-in task to table parsing.
    ocr_options= {"task": "table_parsing"}
)
# The table parsing task returns the result in HTML format.
print(response["output"]["choices"][0]["message"].content[0]["text"])
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {
        // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
        // If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "https://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/doc_parsing/tables/photo/eng/17.jpg");
        // The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        map.put("min_pixels",3072);
        // Specifies whether to enable automatic image rotation.
        map.put("enable_rotate", false);
        
        // Configure the built-in OCR task.
        OcrOptions ocrOptions = OcrOptions.builder()
                .task(OcrOptions.Task.TABLE_PARSING)
                .build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map
                        )).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .ocrOptions(ocrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# === Delete this comment before running ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
  "model": "qwen-vl-ocr-2025-11-20",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/doc_parsing/tables/photo/eng/17.jpg",
            "min_pixels": 3072,
            "max_pixels": 8388608,
            "enable_rotate": false
          }
        ]
      }
    ]
  },
  "parameters": {
    "ocr_options": {
      "task": "table_parsing"
    }
  }
}
'

Document parsing

The following code provides an example of how to call the built-in document parsing task. For more information, see Call a built-in task.

import os
import dashscope

# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [{
            "role": "user",
            "content": [{
                "image": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg",
                # The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
                "min_pixels": 32 * 32 * 3,
                # The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
                "max_pixels": 32 * 32 * 8192,
                # Specifies whether to enable automatic image rotation.
                "enable_rotate": False}]
            }]
            
response = dashscope.MultiModalConversation.call(
    # API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-ocr-2025-11-20',
    messages=messages,
    # Set the built-in task to document parsing.
    ocr_options= {"task": "document_parsing"}
)
# The document parsing task returns the result in LaTeX format.
print(response["output"]["choices"][0]["message"].content[0]["text"])
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {
        // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
        // If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg");
        // The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        map.put("min_pixels", 3072);
        // Specifies whether to enable automatic image rotation.
        map.put("enable_rotate", false);
        
        // Configure the built-in OCR task.
        OcrOptions ocrOptions = OcrOptions.builder()
                .task(OcrOptions.Task.DOCUMENT_PARSING)
                .build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map
                        )).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .ocrOptions(ocrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# === Delete this comment before running ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation'\
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"\
  --header 'Content-Type: application/json'\
  --data '{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
  "messages": [
    {
      "role": "user",
      "content": [{
          "image": "https://img.alicdn.com/imgextra/i1/O1CN01ukECva1cisjyK6ZDK_!!6000000003635-0-tps-1500-1734.jpg",
          "min_pixels": 3072,
          "max_pixels": 8388608,
          "enable_rotate": false
        }
      ]
    }
  ]
},
"parameters": {
  "ocr_options": {
    "task": "document_parsing"
  }
}
}
'

Formula recognition

The following code provides an example of how to call the built-in formula recognition task. For more information, see Call a built-in task.
import os
import dashscope

# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [{
    "role": "user",
    "content": [{
        "image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/formula_handwriting/test/inline_5_4.jpg",
        # The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        "min_pixels": 32 * 32 * 3,
        # The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        "max_pixels": 32 * 32 * 8192,
        # Specifies whether to enable automatic image rotation.
        "enable_rotate": False
    }]
}]
            
response = dashscope.MultiModalConversation.call(
    # API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-ocr-2025-11-20',
    messages=messages,
    # Set the built-in task to formula recognition.
    ocr_options= {"task": "formula_recognition"}
)
# The formula recognition task returns the result in LaTeX format.
print(response["output"]["choices"][0]["message"].content[0]["text"])
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {
        // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
        // If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/formula_handwriting/test/inline_5_4.jpg");
        // The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        map.put("min_pixels", 3072);
        // Specifies whether to enable automatic image rotation.
        map.put("enable_rotate", false);
        
        // Configure the built-in OCR task.
        OcrOptions ocrOptions = OcrOptions.builder()
                .task(OcrOptions.Task.FORMULA_RECOGNITION)
                .build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map
                        )).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .ocrOptions(ocrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# === Delete this comment before running ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
  "model": "qwen-vl-ocr",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "image": "http://duguang-llm.oss-cn-hangzhou.aliyuncs.com/llm_data_keeper/data/formula_handwriting/test/inline_5_4.jpg",
            "min_pixels": 3072,
            "max_pixels": 8388608,
            "enable_rotate": false
          }
        ]
      }
    ]
  },
  "parameters": {
    "ocr_options": {
      "task": "formula_recognition"
    }
  }
}
'

General text recognition

The following code provides an example of how to call the built-in general text recognition task. For more information, see Call a built-in task.

import os
import dashscope

# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [{
            "role": "user",
            "content": [{
                "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
                # The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
                "min_pixels": 32 * 32 * 3,
                # The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
                "max_pixels": 32 * 32 * 8192,
                # Specifies whether to enable automatic image rotation.
                "enable_rotate": False}]
        }]
        
response = dashscope.MultiModalConversation.call(
    # API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-ocr-2025-11-20',
    messages=messages,
    # Set the built-in task to general text recognition.
    ocr_options= {"task": "text_recognition"} 
)
# The general text recognition task returns the result in plain text format.
print(response["output"]["choices"][0]["message"].content[0]["text"])
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {
      // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
      // If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }

    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg");
        // The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        map.put("min_pixels", 3072);
        // Specifies whether to enable automatic image rotation.
        map.put("enable_rotate", false);
        
        // Configure the built-in task.
        OcrOptions ocrOptions = OcrOptions.builder()
                .task(OcrOptions.Task.TEXT_RECOGNITION)
                .build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map
                        )).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .ocrOptions(ocrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# === Delete this comment before running ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation'\
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"\
  --header 'Content-Type: application/json'\
  --data '{
"model": "qwen-vl-ocr-2025-11-20",
"input": {
  "messages": [
    {
      "role": "user",
      "content": [{
          "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
          "min_pixels": 3072,
          "max_pixels": 8388608,
          "enable_rotate": false
        }
      ]
    }
  ]
},
"parameters": {
  "ocr_options": {
      "task": "text_recognition"
    }
}
}'

Multilingual recognition

The following code provides an example of how to call the built-in general multilingual recognition task. For more information, see Call a built-in task.

import os
import dashscope

# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
# If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [{
            "role": "user",
            "content": [{
                "image": "https://img.alicdn.com/imgextra/i2/O1CN01VvUMNP1yq8YvkSDFY_!!6000000006629-2-tps-6000-3000.png",
                # The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
                "min_pixels": 32 * 32 * 3,
                # The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
                "max_pixels": 32 * 32 * 8192,
                # Specifies whether to enable automatic image rotation.
                "enable_rotate": False}]
            }]
            
response = dashscope.MultiModalConversation.call(
    # API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-ocr-2025-11-20',
    messages=messages,
    # Set the built-in task to multilingual recognition.
    ocr_options={"task": "multi_lan"}
)
# The multilingual recognition task returns the result as plain text.
print(response["output"]["choices"][0]["message"].content[0]["text"])
import java.util.Arrays;
import java.util.Collections;
import java.util.Map;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.OcrOptions;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    static {
      // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1.
      // If you use a model in the China (Beijing) region, change the base_url to https://dashscope.aliyuncs.com/api/v1.
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "https://img.alicdn.com/imgextra/i2/O1CN01VvUMNP1yq8YvkSDFY_!!6000000006629-2-tps-6000-3000.png");
        // The maximum pixel threshold for the input image. If the image is larger than this value, it is scaled down until the total pixels are less than max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image is smaller than this value, it is scaled up until the total pixels are greater than min_pixels.
        map.put("min_pixels", 3072);
        // Specifies whether to enable automatic image rotation.
        map.put("enable_rotate", false);
        
        // Configure the built-in OCR task.
        OcrOptions ocrOptions = OcrOptions.builder()
                .task(OcrOptions.Task.MULTI_LAN)
                .build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map
                        )).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .ocrOptions(ocrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# API keys vary by region. To get an API key, see https://www.alibabacloud.com/help/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, replace the base_url with https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# If you use a model in the China (Beijing) region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation.
# === Delete this comment before running ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '
{
  "model": "qwen-vl-ocr-2025-11-20",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "image": "https://img.alicdn.com/imgextra/i2/O1CN01VvUMNP1yq8YvkSDFY_!!6000000006629-2-tps-6000-3000.png",
            "min_pixels": 3072,
            "max_pixels": 8388608,
            "enable_rotate": false
          }
        ]
      }
    ]
  },
  "parameters": {
    "ocr_options": {
      "task": "multi_lan"
    }
  }
}
'

Streaming output

Python

import os
import dashscope

PROMPT_TICKET_EXTRACTION = """
Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image.
You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?).
Return the data in JSON format as follows: {'invoice_number': 'xxx', 'departure_station': 'xxx', 'arrival_station': 'xxx', 'departure_date_and_time':'xxx', 'seat_number': 'xxx','ticket_price':'xxx', 'id_card_number': 'xxx', 'passenger_name': 'xxx'},
"""

# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1
# If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {
                "image": "https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg",
                # The minimum pixel threshold for the input image. If the image has fewer pixels than this value, it is enlarged until its total pixel count exceeds min_pixels.
                "min_pixels": 32 * 32 * 3,
                # The maximum pixel threshold for the input image. If the image has more pixels than this value, it is reduced until its total pixel count is below max_pixels.
                "max_pixels": 32 * 32 * 8192},
            # When no built-in task is set, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
            {
                "type": "text",
                "text": PROMPT_TICKET_EXTRACTION,
            },
        ],
    }
]
response = dashscope.MultiModalConversation.call(
    # API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
    # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen-vl-ocr-2025-11-20",
    messages=messages,
    stream=True,
    incremental_output=True,
)
full_content = ""
print("Streaming output content:")
for response in response:
    try:
        print(response["output"]["choices"][0]["message"].content[0]["text"])
        full_content += response["output"]["choices"][0]["message"].content[0]["text"]
    except:
        pass
print(f"Full content: {full_content}")

Java

import java.util.*;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    // The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to https://dashscope-us.aliyuncs.com/api/v1
    // If you use a model in the China (Beijing) region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }

    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> map = new HashMap<>();
        map.put("image", "https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg");
        // The maximum pixel threshold for the input image. If the image has more pixels than this value, it is reduced until its total pixel count is below max_pixels.
        map.put("max_pixels", 8388608);
        // The minimum pixel threshold for the input image. If the image has fewer pixels than this value, it is enlarged until its total pixel count exceeds min_pixels.
        map.put("min_pixels", 3072);
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        map,
                        // When no built-in task is set, you can pass a prompt in the text field. If no prompt is passed, the default prompt is used: Please output only the text content from the image without any additional descriptions or formatting.
                        Collections.singletonMap("text", "Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image. You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?). Return the data in JSON format as follows: {\'invoice_number\': \'xxx\', \'departure_station\': \'xxx\', \'arrival_station\': \'xxx\', \'departure_date_and_time\':\'xxx\', \'seat_number\': \'xxx\',\'ticket_price\':\'xxx\', \'id_card_number\': \'xxx\', \'passenger_name\': \'xxx\'"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
                // If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-ocr-2025-11-20")
                .message(userMessage)
                .incrementalOutput(true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(item -> {
            try {
                List<Map<String, Object>> contentList = item.getOutput().getChoices().get(0).getMessage().getContent();
                if (!contentList.isEmpty()){
                    System.out.println(contentList.get(0).get("text"));
                }//
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= Important =======
# API keys for different regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# The following is the base URL for the Singapore region. If you use a model in the US (Virginia) region, change the base_url to: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# If you use a model in the China (Beijing) region, change the base_url to: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
--data '{
    "model": "qwen-vl-ocr-2025-11-20",
    "input":{
        "messages":[
          {
            "role": "user",
            "content": [
                {
                    "image": "https://img.alicdn.com/imgextra/i2/O1CN01ktT8451iQutqReELT_!!6000000004408-0-tps-689-487.jpg",
                    "min_pixels": 3072,
                    "max_pixels": 8388608
                },
                {"type": "text", "text": "Please extract the invoice number, train number, departure station, arrival station, departure date and time, seat number, seat class, ticket price, ID card number, and passenger name from the train ticket image. You must accurately extract the key information. Do not omit or fabricate information. Replace any single character that is blurry or obscured by strong light with an English question mark (?). Return the data in JSON format as follows: {\'invoice_number\': \'xxx\', \'departure_station\': \'xxx\', \'arrival_station\': \'xxx\', \'departure_date_and_time\':\'xxx\', \'seat_number\': \'xxx\',\'ticket_price\':\'xxx\', \'id_card_number\': \'xxx\', \'passenger_name\': \'xxx\'}"}
            ]
          }
        ]
    },
    "parameters": {
        "incremental_output": true
    }
}'

model string (Required)

The name of the model. See Qwen-OCR for a list of supported models.

messages array (Required)

The context for the model, provided as a sequence of messages in conversational order.

When you call the API over HTTP, place the messages object inside the input object.

Message types

User Message object (Required)

A user message that passes questions, instructions, or context to the model.

Properties

content string or array (Required)

The message content. Use a string for text-only input. Use an array if the input includes image data.

Properties

text string (Optional)

The input text.

The default value is Please output only the text content from the image without any additional descriptions or formatting. This default behavior instructs the model to extract all text from the image.

image string (Optional)

The URL, Base64 Data URL, or local path of the image. For more information about passing a local file, see Passing local files.

Example: {"image":"https://xxxx.jpeg"}

enable_rotate boolean (Optional) Default: false

Specifies whether to correct skewed images.

Valid values:

  • true

  • false

Example: {"image":"https://xxxx.jpeg","enable_rotate": True}

min_pixels integer (Optional)

The minimum pixel threshold for the input image.

If an input image has fewer pixels than min_pixels, the model enlarges the image until its total pixel count is greater than min_pixels.

Conversion between image tokens and pixels

The number of pixels per image token varies by model:

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20: Each token corresponds to 32×32 pixels.

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models: Each token corresponds to 28×28 pixels.

Value range for min_pixels

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20: The default and minimum value is 3072 (that is, 3×32×32).

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models: The default and minimum value is 3136 (that is, 4×28×28).

Example: {"image":"https://xxxx.jpeg","min_pixels": 3072}

max_pixels integer (Optional)

The maximum pixel threshold for the input image.

If an image's pixel count is within the [min_pixels, max_pixels] range, the model processes it at its original size. If the pixel count exceeds max_pixels, the model scales down the image until its total pixel count is less than or equal to max_pixels.

Conversion between image tokens and pixels

The number of pixels per image token varies by model:

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20: Each token corresponds to 32×32 pixels.

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models: Each token corresponds to 28×28 pixels.

Value range for max_pixels

  • qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20

    • Default value: 8388608 (that is, 8192×32×32)

    • Maximum value: 30720000 (that is, 30000×32×32)

  • qwen-vl-ocr, qwen-vl-ocr-2025-08-28, and earlier models

    • Default value: 6422528 (that is, 8192×28×28)

    • Maximum value: 23520000 (that is, 30000×28×28)

Example: {"image":"https://xxxx.jpeg","max_pixels": 8388608}

role string (Required)

The role for a user message. The value must be user.

max_tokens integer (Optional)

The maximum number of tokens to generate in the output. If the generated content exceeds this value, the response is truncated.

  • For qwen-vl-ocr-latest, qwen-vl-ocr-2025-11-20, and qwen-vl-ocr-2024-10-28, the default and maximum values are the same as the model's maximum output length. For more information, see Availability.

  • For qwen-vl-ocr, qwen-vl-ocr-2025-04-13, and qwen-vl-ocr-2025-08-28, the default and maximum values are 4096.

    To increase this parameter's value to a number between 4097 and 8192, send an email to modelstudio@service.aliyun.com. Your email must include the following information: your Alibaba Cloud account ID, the image type (such as document, e-commerce, or contract), the model name, your estimated queries per second (QPS) and total daily requests, and the percentage of requests where the model output exceeds 4096 tokens.
In the Java SDK, the parameter is maxTokens. For HTTP calls, set max_tokens in the parameters object.

ocr_options object (Optional)

The parameters to configure when you call a built-in task with the Qwen-OCR model. When you call a built-in task, you do not need to pass a User Message because the model uses the default Prompt for that task. For more information, see Call a built-in task.

Properties

task string (Required)

The name of the built-in task. Valid values are:

  • text_recognition: General text recognition

  • key_information_extraction: Information extraction

  • document_parsing: Document parsing

  • table_parsing: Table parsing

  • formula_recognition: Formula recognition

  • multi_lan: Multilingual recognition

  • advanced_recognition: High-precision recognition

task_config object (Optional)

When task is set to key_information_extraction (Information extraction), this parameter specifies the fields to extract. If you do not specify task_config, the model extracts all fields from the image by default.

Properties

result_schema object (Optional)

Specifies the fields for the model to extract. The value must be a JSON object. You can nest JSON objects up to three layers deep.

Specify the name of the field to extract in the JSON object's key. The corresponding value can be empty. To achieve higher extraction accuracy, you can provide a field description or format requirement in the value.

Example:

"result_schema": {
     "invoice_number": "The unique identification number of the invoice, usually a combination of numbers and letters.",
     "issue_date": "The date the invoice was issued. Extract it in YYYY-MM-DD format, for example, 2023-10-26.",
     "seller_name": "The full company name of the seller shown on the invoice.",
     "total_amount": "The total amount on the invoice, including tax. Extract the numerical value and keep two decimal places, for example, 123.45." 
}
In the Java SDK, this parameter is named OcrOptions. The minimum required version for the DashScope Python SDK is 1.22.2, and for the Java SDK is 2.18.4.
For HTTP calls, place ocr_options in the parameters object.

seed integer (Optional)

A random number seed. Using a seed ensures reproducible results. If you pass the same seed value in a call and keep other parameters unchanged, the model returns a deterministic result.

Value range: [0,2<sup>31</sup>−1].

We recommend that you use the default value.
When you make an HTTP call, place seed in the parameters object.

temperature float (Optional) Default: 0.01

The sampling temperature controls the diversity of the text generated by the model.

A higher temperature results in more diverse text, while a lower temperature results in more deterministic text.

Value range: [0, 2)

Because both temperature and top_p control the diversity of the generated text, you can set only one of them.

We recommend that you use the default value.
When you make an HTTP call, place temperature in the parameters object.

top_p float (Optional) Default: 0.001

This parameter is the probability threshold for nucleus sampling, which controls the diversity of the text that the model generates.

A higher value results in more diverse text. A lower value results in more deterministic text.

Value range: (0, 1.0]

Because both temperature and top_p control text diversity, you should set only one of them.

We recommend that you use the default value.
In the Java SDK, the parameter is topP. For HTTP calls, place top_p in the parameters object.

top_k integer (Optional) Default: 1

Specifies the size of the candidate set for sampling during generation. For example, if you set the value to 50, only the 50 tokens with the highest scores are used as the candidate set for random sampling. A larger value increases randomness, while a smaller value increases determinism. If the value is None or greater than 100, the top_k policy is not enabled. In this case, only the top_p policy takes effect.

The value must be greater than or equal to 0.

This parameter is not a standard OpenAI parameter. When using the Python SDK, place this parameter in the extra_body object. For example: extra_body={"top_k": xxx}. When using the Node.js SDK or HTTP, pass this parameter at the top level.

We recommend that you use the default value.

repetition_penalty float (Optional) Default: 1.0

The penalty for repeated sequences during model generation. A higher value can reduce repetition in the generated text. A value of 1.0 means no penalty is applied.

We recommend using the default value.
In the Java SDK, the parameter is repetitionPenalty. For HTTP calls, add repetition_penalty to the parameters object.

presence_penalty float (Optional) Default: 0.0

Controls the repetition of content in the text generated by the model.

The value must be within the range of -2.0 to 2.0. A positive value reduces repetition, and a negative value increases it.

Increase this value for scenarios that require diversity, creativity, or brainstorming, such as creative writing. Decrease this value for scenarios that emphasize consistency and terminological accuracy, such as technical documents or formal texts.

How it works

If the value of this parameter is positive, the model applies a penalty to tokens that already exist in the text. The penalty is applied regardless of the number of times a token appears. This reduces the likelihood of these tokens reappearing, which decreases content repetition and increases word diversity.

We recommend that you use the default value.

stream boolean (Optional) Default: false

Specifies whether to stream the response. Valid values:

  • false: The model returns the result at once after all content is generated.

  • true: The model outputs content in chunks as it is generated.

This parameter is supported only by the Python SDK. To use streaming output with the Java SDK, you can call the streamCall interface. To use streaming output over HTTP, set X-DashScope-SSE to enable in the header.

incremental_output boolean (Optional) Default: false

Specifies whether to enable incremental output in streaming output mode. The recommended setting is true.

Valid values:

  • false: Each output contains the entire sequence generated so far. The final output is the complete result.

    I
    I like
    I like apple
    I like apple.
  • true (Recommended): Enables incremental output. Subsequent outputs contain only the newly generated content. You must concatenate these segments to retrieve the complete result.

    I
    like
    apple
    .
In the Java SDK, the parameter is incrementalOutput. For HTTP calls, add incremental_output to the parameters object.

stop string or array (Optional)

Specifies the stop words. When a string or token_id specified in stop appears in the generated text, generation stops immediately.

You can use this parameter to specify sensitive words and control the model's output.

If stop is an array, you cannot mix token_ids and strings as elements. For example, you cannot specify ["Hello",104307].

logprobs boolean (Optional) Default: false

Specifies whether to return the log probabilities of output tokens. Valid values:

  • true

  • false

Supported models: qwen-vl-ocr-2025-04-13 and later models.

For HTTP calls, place logprobs in the parameters object.

top_logprobs integer (Optional) Default: 0

Specifies the number of most likely tokens to return at each generation step. This parameter applies only when logprobs is set to true.

The value must be an integer from 0 to 5.

In the Java SDK, the parameter is named topLogprobs. For HTTP calls, set the top_logprobs parameter in the parameters object.

Chat response object (same format for streaming and non-streaming outputs)

{"status_code": 200,
  "request_id": "8f8c0f6e-6805-4056-bb65-d26d66080a41",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "ocr_result": {
                "kv_result": {
                  "price_excluding_tax": "230769.23",
                  "invoice_code": "142011726001",
                  "organization_code": "null",
                  "buyer_name": "Cai Yingshi",
                  "seller_name": "null"
                }
              },
              "text": "```json\n{\n    \"price_excluding_tax\": \"230769.23\",\n    \"invoice_code\": \"142011726001\",\n    \"organization_code\": \"null\",\n    \"buyer_name\": \"Cai Yingshi\",\n    \"seller_name\": \"null\"\n}\n```"
            }
          ]
        }
      }
    ],
    "audio": null
  },
  "usage": {
    "input_tokens": 926,
    "output_tokens": 72,
    "characters": 0,
    "image_tokens": 754,
    "input_tokens_details": {
      "image_tokens": 754,
      "text_tokens": 172
    },
    "output_tokens_details": {
      "text_tokens": 72
    },
    "total_tokens": 998
  }
}

status_code string

The status code of the request. A value of 200 indicates that the request is successful. Otherwise, the request has failed.

The Java SDK does not return this parameter. If a call fails, an exception is thrown. The exception message contains the `status_code` and `message`.

request_id string

The unique identifier for this call.

The Java SDK returns this parameter as requestId

code string

The error code. This parameter is empty if the call is successful.

Only the Python SDK returns this parameter.

output object

Information about the call result.

Properties

text string

This parameter is currently fixed to null.

finish_reason string

The reason that the model stopped generating. Valid values:

  • It is null during generation.

  • A stop value indicates that the model output has ended naturally.

  • The generation is stopped because the output is too long. The stop reason is length.

choices array

The output information from the model.

Properties

finish_reason string

This applies to the following scenarios:

  • The value is null during generation.

  • The model's output generation completes naturally, which is indicated by stop.

  • length: The output was truncated because it reached the maximum length.

message object

The message object that is output by the model.

Properties

role string

The role of the output message. The value is fixed to assistant.

content object

The content of the output message.

Properties

ocr_result object

When you use a Qwen-OCR model to call a built-in information extraction or high-precision recognition task, this parameter contains the task result.

Properties

kv_result array

The result of the information extraction task.

words_info array

The result of the high-precision recognition task.

Properties

rotate_rect array

Example: [center_x, center_y, width, height, angle]

The rotated rectangle representation of the text box:

  • center_x and center_y are the coordinates of the text box's centroid.

  • width is the width of the text box, and height is the height.

  • angle is the rotation angle of the text box relative to the horizontal direction. The value range is [-90, 90].

location array

Example: [x1, y1, x2, y2, x3, y3, x4, y4]

The coordinates of the four vertices of the text box. The coordinates are arranged in clockwise order, starting from the top-left vertex: top-left → top-right → bottom-right → bottom-left.

text string

The content of the text line.

text string

The content of the output message.

logprobs object

The probability information for the current `choices` object.

Properties

content array

An array of tokens that have log probability information.

Properties

token string

The current token.

bytes array

A list of the original UTF-8 bytes of the current token. This helps accurately reconstruct the output content, especially when you handle emojis and Chinese characters.

logprob float

The log probability of the current token. A `null` return value indicates an extremely low probability.

top_logprobs array

The most likely tokens at the current token's position and their log probabilities. The number of elements is the same as the value of the top_logprobs input parameter.

Properties

token string

The current token.

bytes array

A list of the original UTF-8 bytes of the current token. This helps accurately reconstruct the output content, especially when you handle emojis and Chinese characters.

logprob float

The log probability of the current token. A `null` return value indicates an extremely low probability.

usage object

Information about the tokens that are used in this request.

Properties

input_tokens integer

The number of input tokens.

output_tokens integer

The number of output tokens.

characters integer

This parameter is currently fixed to 0.

input_tokens_details object

A fine-grained classification of input tokens.

Properties

image_tokens integer

The number of tokens that correspond to the image that is input to the model.

text_tokens integer

The number of tokens that correspond to the text that is input to the model.

output_tokens_details object

A fine-grained classification of output tokens.

Properties

text_tokens integer

The number of tokens in the model input.

total_tokens integer

The total number of tokens consumed. This is the sum of input_tokens and output_tokens.

image_tokens integer

This field is returned if the input includes an image. It represents the number of tokens that correspond to the image input.

Error codes

If a model call returns an error, see Error messages to resolve the issue.