All Products
Search
Document Center

Alibaba Cloud Model Studio:Data mining (Qwen-Doc-Turbo)

Last Updated:Mar 15, 2026

The data mining model extracts information, moderates content, classifies data, and generates summaries. It outputs structured data (like JSON) quickly and accurately, unlike general-purpose chat models which may return inconsistent formats or extract information incorrectly.

Note

This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.

Implementation guide

Qwen-Doc-Turbo supports extracting information from files in three ways. For more information about file size and type limits, see Limitations.

Feature

File URL (Recommended)

File ID

Plain text

File source

Public URL

Local file (upload required)

Passed as a string

Input length limit

Up to 10 files
Supports large files (max input 253k tokens)



1 file
Supports large files (max input 253k tokens)



Up to 9,000 tokens

SDK compatibility

Only DashScope

Upload: OpenAI
Call: OpenAI and DashScope



OpenAI and DashScope

Key advantages

No upload to Model Studio required. Supports batch calls.

Avoids repeated uploads. Ideal for reuse.

No file management required.

Prerequisites

Pass a file URL

Extract structured data using file URLs (up to 10 files simultaneously). This example passes the Sample Product Manual A and Sample Product Manual B files and prompts the model to return extracted information in JSON format.

File URL method supports only DashScope protocol. Use the DashScope Python SDK or HTTP calls (like curl).
import os
import dashscope

response = dashscope.Generation.call(
    api_key=os.getenv('DASHSCOPE_API_KEY'), # If you have not set the environment variable, replace this with your API key
    model='qwen-doc-turbo',
    messages=[
    {"role": "system","content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "From these two product manuals, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."
            },
            {
                "type": "doc_url",
                "doc_url": [
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/jockge/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CA.docx",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/ztwxzr/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CB.docx"
                ],
                "file_parsing_strategy": "auto"
            }
        ]
    }]
)
try:
    if response.status_code == 200:
        print(response.output.choices[0].message.content)
    else:
        print(f"Request failed, status code: {response.status_code}")
        print(f"Error code: {response.code}")
        print(f"Error message: {response.message}")
        print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
except Exception as e:
    print(f"An error occurred: {e}")
    print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'X-DashScope-SSE: enable' \
--data '{
    "model": "qwen-doc-turbo",
    "input": {
        "messages": [
                {
                    "role": "system",
                    "content": "you are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "From these two product manuals, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."
                        },
                        {
                            "type": "doc_url",
                            "doc_url": [
                                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/jockge/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CA.docx",
                                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/ztwxzr/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CB.docx"
                            ],
                            "file_parsing_strategy": "auto"
                        }
                    ]
                }
            ]
    }
}'

Response example

[
  {
    "model": "PRO-100",
    "name": "Smart Printer",
    "price": "8999"
  },
  {
    "model": "PRO-200",
    "name": "Smart Scanner",
    "price": "12999"
  },
  ...
  {
    "model": "SEC-400",
    "name": "Smart Visitor System",
    "price": "9999"
  },
  {
    "model": "SEC-500",
    "name": "Smart Parking Management",
    "price": "22999"
  }
]

Pass a file ID

Upload a file

Before running the code, download Sample Product Manual A and place it in your project directory. Upload the file via the OpenAI compatible interface to get a file-id. For upload API details, see the API reference.

Python

import os
from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not set the environment variable, replace this with your API key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",  # Enter the DashScope service base_url
)

file_object = client.files.create(file=Path("Sample Product Manual A.docx"), purpose="file-extract")
# Print the file-id for use in subsequent model calls
print(file_object.id)

Java

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.files.*;

import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) {
        // Create a client and use the API key from the environment variable
        OpenAIClient client = OpenAIOkHttpClient.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();
        // Set the file path. Modify the path and filename as needed.
        Path filePath = Paths.get("src/main/java/org/example/Sample Product Manual A.docx");
        // Create file upload parameters
        FileCreateParams fileParams = FileCreateParams.builder()
                .file(filePath)
                .purpose(FilePurpose.of("file-extract"))
                .build();

        // Upload the file and print the file-id
        FileObject fileObject = client.files().create(fileParams);
        // Print the file-id for use in subsequent model calls
        System.out.println(fileObject.id());
    }
}

curl

curl --location --request POST 'https://dashscope.aliyuncs.com/compatible-mode/v1/files' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --form 'file=@"Sample Product Manual A.docx"' \
  --form 'purpose="file-extract"'

Run the code to obtain the file-id for the uploaded file.

Pass information and start a conversation using a file ID

Pass the file-id in a system message (after the role-setting message). The user message contains your query about the file.

import os
from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not set the environment variable, replace this with your API key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

try:
    completion = client.chat.completions.create(
        model="qwen-doc-turbo",
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            # Replace '{FILE_ID}' with the file-id from your scenario
            {'role': 'system', 'content': 'fileid://{FILE_ID}'},
            {'role': 'user', 'content': 'From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).'}
        ],
        # This code example uses streaming output to clearly show the model's output process. For non-streaming output examples, see https://www.alibabacloud.com/help/en/model-studio/user-guide/text-generation
        stream=True,
        stream_options={"include_usage": True}
    )

    full_content = ""
    for chunk in completion:
        if chunk.choices and chunk.choices[0].delta.content:
            full_content += chunk.choices[0].delta.content
            print(chunk.model_dump())
    
    print(full_content)

except BadRequestError as e:
    print(f"Error message: {e}")
    print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.chat.completions.*;

public class Main {
    public static void main(String[] args) {
        // Create a client and use the API key from the environment variable
        OpenAIClient client = OpenAIOkHttpClient.builder()
                // If you have not set the environment variable, replace the next line with your Model Studio API key: .apiKey("sk-xxx");
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();

        ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
                .addSystemMessage("You are a helpful assistant.")
                // Replace '{FILE_ID}' with the file-id from your scenario
                .addSystemMessage("fileid://{FILE_ID}")
                .addUserMessage("From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).")
                .model("qwen-doc-turbo")
                .build();

        try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
            streamResponse.stream().forEach(chunk -> {
                String content = chunk.choices().get(0).delta().content().orElse("");
                if (!content.isEmpty()) {
                    System.out.print(content);
                }
            });
        } catch (Exception e) {
            System.err.println("Error message: " + e.getMessage());
        }
    }
}
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model": "qwen-doc-turbo",
    "messages": [
        {"role": "system","content": "You are a helpful assistant."},
        {"role": "system","content": "fileid://{FILE_ID}"},
        {"role": "user","content": "From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."}
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    }
}'

Complete example: Upload a file and call the model

import os
import time
from pathlib import Path
from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not set the environment variable, replace this with your API key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

try:
    # Step 1: Upload the file
    file_object = client.files.create(file=Path("Sample Product Manual A.docx"), purpose="file-extract")
    file_id = file_object.id
    print(f"File uploaded successfully. file-id: {file_id}")
    
    # Step 2: Wait for the file to finish parsing (optional, may be needed for large files)
    # If the file is still parsing, the API returns an error. You must retry.
    max_retries = 10
    retry_count = 0
    
    while retry_count < max_retries:
        try:
            # Step 3: Call the model using the file-id
            completion = client.chat.completions.create(
                model="qwen-doc-turbo",
                messages=[
                    {'role': 'system', 'content': 'You are a helpful assistant.'},
                    {'role': 'system', 'content': f'fileid://{file_id}'},
                    {'role': 'user', 'content': 'From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).'}
                ],
                stream=True,
                stream_options={"include_usage": True}
            )
            
            # Step 4: Process the model output
            full_content = ""
            for chunk in completion:
                if chunk.choices and chunk.choices[0].delta.content:
                    full_content += chunk.choices[0].delta.content
                    print(chunk.choices[0].delta.content, end='', flush=True)
            
            print(f"\n\nFull output:\n{full_content}")
            break
            
        except BadRequestError as e:
            if "File parsing in progress" in str(e):
                retry_count += 1
                print(f"File is parsing. Retrying after a delay ({retry_count}/{max_retries})...")
                time.sleep(2)  # Wait 2 seconds and retry
            else:
                raise e
    
    if retry_count >= max_retries:
        print("File parsing timed out. Try again later.")

except BadRequestError as e:
    print(f"Error message: {e}")
    print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
except Exception as e:
    print(f"An error occurred: {e}")
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.chat.completions.*;
import com.openai.models.files.*;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;

public class Main {
    public static void main(String[] args) {
        // Create a client
        OpenAIClient client = OpenAIOkHttpClient.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();
        
        try {
            // Step 1: Upload the file
            Path filePath = Paths.get("src/main/java/org/example/Sample Product Manual A.docx");
            FileCreateParams fileParams = FileCreateParams.builder()
                    .file(filePath)
                    .purpose(FilePurpose.of("file-extract"))
                    .build();
            
            FileObject fileObject = client.files().create(fileParams);
            String fileId = fileObject.id();
            System.out.println("File uploaded successfully. file-id: " + fileId);
            
            // Step 2: Wait for the file to parse and then call the model (max 10 retries)
            int maxRetries = 10;
            int retryCount = 0;
            boolean success = false;
            
            while (retryCount < maxRetries && !success) {
                try {
                    // Step 3: Call the model using the file-id
                    ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
                            .addSystemMessage("You are a helpful assistant.")
                            .addSystemMessage("fileid://" + fileId)
                            .addUserMessage("From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).")
                            .model("qwen-doc-turbo")
                            .build();
                    
                    // Step 4: Process the model output
                    try (StreamResponse<ChatCompletionChunk> streamResponse = 
                            client.chat().completions().createStreaming(chatParams)) {
                        streamResponse.stream().forEach(chunk -> {
                            String content = chunk.choices().get(0).delta().content().orElse("");
                            if (!content.isEmpty()) {
                                System.out.print(content);
                            }
                        });
                        System.out.println();
                        success = true;
                    }
                    
                } catch (Exception e) {
                    if (e.getMessage() != null && e.getMessage().contains("File parsing in progress")) {
                        retryCount++;
                        System.out.println("File is parsing. Retrying after a delay (" + retryCount + "/" + maxRetries + ")...");
                        TimeUnit.SECONDS.sleep(2);  // Wait 2 seconds and retry
                    } else {
                        throw e;
                    }
                }
            }
            
            if (!success) {
                System.out.println("File parsing timed out. Try again later.");
            }
            
        } catch (Exception e) {
            System.err.println("Error message: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Example response

[
  {
    "model": "PRO-100",
    "name": "Smart Printer",
    "price": "8999"
  },
  {
    "model": "PRO-200",
    "name": "Smart Scanner",
    "price": "12999"
  },
  ...
  {
    "model": "SEC-400",
    "name": "Smart Visitor System",
    "price": "9999"
  },
  {
    "model": "SEC-500",
    "name": "Smart Parking Management",
    "price": "22999"
  }
]

Pass plain text

You can pass file content directly as a string instead of using a file-id. To prevent confusion, put the role-setting message first in the messages array.

If text content exceeds 9,000 tokens, use a file URL or file ID instead (due to API body size limits).
import os
from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not set the environment variable, replace this with your API key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

try:
    completion = client.chat.completions.create(
        model="qwen-doc-turbo",
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'system', 'content': 'Smart Office Product Manual Version: V2.0 Release Date: January 2024 Table of Contents 1.1 Product Overview...'},
            {'role': 'user', 'content': 'From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).'}
        ],
        # This code example uses streaming output to clearly show the model's output process. For non-streaming output examples, see https://www.alibabacloud.com/help/en/model-studio/user-guide/text-generation
        stream=True,
        stream_options={"include_usage": True}
    )

    full_content = ""
    for chunk in completion:
        if chunk.choices and chunk.choices[0].delta.content:
            full_content += chunk.choices[0].delta.content
            print(chunk.model_dump())
    
    print(full_content)

except BadRequestError as e:
    print(f"Error message: {e}")
    print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.chat.completions.*;

public class Main {
    public static void main(String[] args) {
        // Create a client and use the API key from the environment variable
        OpenAIClient client = OpenAIOkHttpClient.builder()
                // If you have not set the environment variable, replace the next line with your Model Studio API key: .apiKey("sk-xxx");
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();

        ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
                .addSystemMessage("You are a helpful assistant.")
                .addSystemMessage("Smart Office Product Manual Version: V2.0 Release Date: January 2024 Table of Contents 1.1 Product Overview...")
                .addUserMessage("From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).")
                .model("qwen-doc-turbo")
                .build();

        try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
            streamResponse.stream().forEach(chunk -> {
                String content = chunk.choices().get(0).delta().content().orElse("");
                if (!content.isEmpty()) {
                    System.out.print(content);
                }
            });
        } catch (Exception e) {
            System.err.println("Error message: " + e.getMessage());
        }
    }
}
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model": "qwen-doc-turbo",
    "messages": [
        {"role": "system","content": "You are a helpful assistant."},
        {"role": "system","content": "Smart Office Product Manual Version: V2.0 Release Date: January 2024 Table of Contents 1.1 Product Overview..."},
        {"role": "user","content": "From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."}
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    }
}'

Response example

[
  {
    "model": "PRO-100",
    "name": "Smart Printer",
    "price": "8999"
  },
  {
    "model": "PRO-200",
    "name": "Smart Scanner",
    "price": "12999"
  },
  ...
  {
    "model": "SEC-400",
    "name": "Smart Visitor System",
    "price": "9999"
  },
  {
    "model": "SEC-500",
    "name": "Smart Parking Management",
    "price": "22999"
  }
]

Model pricing

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Tokens)

(Million tokens)

qwen-doc-turbo

262,144

253,952

32,768

$0.087

$0.144

No free quota

FAQ

  1. Where are files stored after being uploaded through the OpenAI compatible file interface?

    Files uploaded via the OpenAI compatible interface are stored free in your Model Studio bucket. To query and manage files, see OpenAI file interface.

  2. When uploading using the file URL method, what are the differences between the file_parsing_strategy parameter options?

    "auto": automatically parses based on content. "text_only": parses text only. "text_and_images": parses both images and text (increases parsing time).

  3. How can I determine if a file has finished parsing?

    Try starting a conversation with the file ID. If the file is still parsing, the API returns File parsing in progress, please try again later. -- retry after a delay. If the call succeeds, the file is ready.

  4. Does the parsing process after file upload incur any extra costs?

    Document parsing is free of charge.

API reference

For the input and output parameters of Qwen-Doc-Turbo, see Qwen API reference.

Error codes

If the model call fails and returns an error message, see Error messages for resolution.

Limitations

  • SDK dependencies:

    • File URL (doc_url): Supports only DashScope protocol. Use the DashScope Python SDK or HTTP calls (like curl).

    • Upload file (file-id): Must use an OpenAI-compatible SDK for upload and management.

  • File upload and reference:

    • File URL (doc_url): Up to 10 URLs per request. URLs must be publicly accessible.

    • Upload file (file-id): Max 150 MB per file. Account limits: 10,000 files or 100 GB total (files never expire). Each request references one file only.

      Upload requests fail when limits are reached. Delete unneeded files to free quota. See OpenAI compatible - File for details.
    • Supported formats: TXT, DOC, DOCX, PDF, XLS, XLSX, MD, PPT, PPTX, JPG, JPEG, PNG, GIF, and BMP.

  • API input:

    • Using doc_url or file-id: max 262,144 tokens.

    • Plain text in user/system messages: max 9,000 tokens per message.

  • API output:

    • The maximum output length is 32,768 tokens.

  • File sharing:

    • file-id works only within the generating account -- not across accounts or with RAM user API keys.

  • Rate limit: See Rate limits.