The data mining model extracts information, moderates content, classifies data, and generates summaries. It outputs structured data (like JSON) quickly and accurately, unlike general-purpose chat models which may return inconsistent formats or extract information incorrectly.
This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.
Implementation guide
Qwen-Doc-Turbo supports extracting information from files in three ways. For more information about file size and type limits, see Limitations.
|
Feature |
File URL (Recommended) |
File ID |
Plain text |
|
File source |
Public URL |
Local file (upload required) |
Passed as a string |
|
Input length limit |
Up to 10 files |
1 file |
Up to 9,000 tokens |
|
SDK compatibility |
Only |
Upload: |
|
|
Key advantages |
No upload to Model Studio required. Supports batch calls. |
Avoids repeated uploads. Ideal for reuse. |
No file management required. |
Prerequisites
-
You have created an API key, and export the API key as an environment variable.
-
If you plan to call the model using a SDK, install the OpenAI SDK or the DashScope SDK.
Pass a file URL
Extract structured data using file URLs (up to 10 files simultaneously). This example passes the Sample Product Manual A and Sample Product Manual B files and prompts the model to return extracted information in JSON format.
File URL method supports only DashScope protocol. Use the DashScope Python SDK or HTTP calls (like curl).
import os
import dashscope
response = dashscope.Generation.call(
api_key=os.getenv('DASHSCOPE_API_KEY'), # If you have not set the environment variable, replace this with your API key
model='qwen-doc-turbo',
messages=[
{"role": "system","content": "You are a helpful assistant."},
{
"role": "user",
"content": [
{
"type": "text",
"text": "From these two product manuals, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."
},
{
"type": "doc_url",
"doc_url": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/jockge/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CA.docx",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/ztwxzr/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CB.docx"
],
"file_parsing_strategy": "auto"
}
]
}]
)
try:
if response.status_code == 200:
print(response.output.choices[0].message.content)
else:
print(f"Request failed, status code: {response.status_code}")
print(f"Error code: {response.code}")
print(f"Error message: {response.message}")
print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
except Exception as e:
print(f"An error occurred: {e}")
print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'X-DashScope-SSE: enable' \
--data '{
"model": "qwen-doc-turbo",
"input": {
"messages": [
{
"role": "system",
"content": "you are a helpful assistant."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "From these two product manuals, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."
},
{
"type": "doc_url",
"doc_url": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/jockge/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CA.docx",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251107/ztwxzr/%E7%A4%BA%E4%BE%8B%E4%BA%A7%E5%93%81%E6%89%8B%E5%86%8CB.docx"
],
"file_parsing_strategy": "auto"
}
]
}
]
}
}'
Pass a file ID
Upload a file
Before running the code, download Sample Product Manual A and place it in your project directory. Upload the file via the OpenAI compatible interface to get a file-id. For upload API details, see the API reference.
Python
import os
from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not set the environment variable, replace this with your API key
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1", # Enter the DashScope service base_url
)
file_object = client.files.create(file=Path("Sample Product Manual A.docx"), purpose="file-extract")
# Print the file-id for use in subsequent model calls
print(file_object.id)
Java
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.files.*;
import java.nio.file.Paths;
public class Main {
public static void main(String[] args) {
// Create a client and use the API key from the environment variable
OpenAIClient client = OpenAIOkHttpClient.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
.build();
// Set the file path. Modify the path and filename as needed.
Path filePath = Paths.get("src/main/java/org/example/Sample Product Manual A.docx");
// Create file upload parameters
FileCreateParams fileParams = FileCreateParams.builder()
.file(filePath)
.purpose(FilePurpose.of("file-extract"))
.build();
// Upload the file and print the file-id
FileObject fileObject = client.files().create(fileParams);
// Print the file-id for use in subsequent model calls
System.out.println(fileObject.id());
}
}
curl
curl --location --request POST 'https://dashscope.aliyuncs.com/compatible-mode/v1/files' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--form 'file=@"Sample Product Manual A.docx"' \
--form 'purpose="file-extract"'
Run the code to obtain the file-id for the uploaded file.
Pass information and start a conversation using a file ID
Pass the file-id in a system message (after the role-setting message). The user message contains your query about the file.
import os
from openai import OpenAI, BadRequestError
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not set the environment variable, replace this with your API key
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
try:
completion = client.chat.completions.create(
model="qwen-doc-turbo",
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
# Replace '{FILE_ID}' with the file-id from your scenario
{'role': 'system', 'content': 'fileid://{FILE_ID}'},
{'role': 'user', 'content': 'From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).'}
],
# This code example uses streaming output to clearly show the model's output process. For non-streaming output examples, see https://www.alibabacloud.com/help/en/model-studio/user-guide/text-generation
stream=True,
stream_options={"include_usage": True}
)
full_content = ""
for chunk in completion:
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(chunk.model_dump())
print(full_content)
except BadRequestError as e:
print(f"Error message: {e}")
print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.chat.completions.*;
public class Main {
public static void main(String[] args) {
// Create a client and use the API key from the environment variable
OpenAIClient client = OpenAIOkHttpClient.builder()
// If you have not set the environment variable, replace the next line with your Model Studio API key: .apiKey("sk-xxx");
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
.build();
ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
.addSystemMessage("You are a helpful assistant.")
// Replace '{FILE_ID}' with the file-id from your scenario
.addSystemMessage("fileid://{FILE_ID}")
.addUserMessage("From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).")
.model("qwen-doc-turbo")
.build();
try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
streamResponse.stream().forEach(chunk -> {
String content = chunk.choices().get(0).delta().content().orElse("");
if (!content.isEmpty()) {
System.out.print(content);
}
});
} catch (Exception e) {
System.err.println("Error message: " + e.getMessage());
}
}
}curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"model": "qwen-doc-turbo",
"messages": [
{"role": "system","content": "You are a helpful assistant."},
{"role": "system","content": "fileid://{FILE_ID}"},
{"role": "user","content": "From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."}
],
"stream": true,
"stream_options": {
"include_usage": true
}
}'Pass plain text
You can pass file content directly as a string instead of using a file-id. To prevent confusion, put the role-setting message first in the messages array.
If text content exceeds 9,000 tokens, use a file URL or file ID instead (due to API body size limits).
import os
from openai import OpenAI, BadRequestError
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not set the environment variable, replace this with your API key
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
try:
completion = client.chat.completions.create(
model="qwen-doc-turbo",
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'system', 'content': 'Smart Office Product Manual Version: V2.0 Release Date: January 2024 Table of Contents 1.1 Product Overview...'},
{'role': 'user', 'content': 'From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).'}
],
# This code example uses streaming output to clearly show the model's output process. For non-streaming output examples, see https://www.alibabacloud.com/help/en/model-studio/user-guide/text-generation
stream=True,
stream_options={"include_usage": True}
)
full_content = ""
for chunk in completion:
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(chunk.model_dump())
print(full_content)
except BadRequestError as e:
print(f"Error message: {e}")
print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.chat.completions.*;
public class Main {
public static void main(String[] args) {
// Create a client and use the API key from the environment variable
OpenAIClient client = OpenAIOkHttpClient.builder()
// If you have not set the environment variable, replace the next line with your Model Studio API key: .apiKey("sk-xxx");
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
.build();
ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
.addSystemMessage("You are a helpful assistant.")
.addSystemMessage("Smart Office Product Manual Version: V2.0 Release Date: January 2024 Table of Contents 1.1 Product Overview...")
.addUserMessage("From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed).")
.model("qwen-doc-turbo")
.build();
try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
streamResponse.stream().forEach(chunk -> {
String content = chunk.choices().get(0).delta().content().orElse("");
if (!content.isEmpty()) {
System.out.print(content);
}
});
} catch (Exception e) {
System.err.println("Error message: " + e.getMessage());
}
}
}curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"model": "qwen-doc-turbo",
"messages": [
{"role": "system","content": "You are a helpful assistant."},
{"role": "system","content": "Smart Office Product Manual Version: V2.0 Release Date: January 2024 Table of Contents 1.1 Product Overview..."},
{"role": "user","content": "From this product manual, extract all product information and organize it into a standard JSON array. Each object must include the following: model (the product model), name (the product name), and price (the price, with currency symbols and commas removed)."}
],
"stream": true,
"stream_options": {
"include_usage": true
}
}'Model pricing
|
Model |
Context window |
Max input |
Max output |
Input cost |
Output cost |
Free quota |
|
(Tokens) |
(Million tokens) |
|||||
|
qwen-doc-turbo |
262,144 |
253,952 |
32,768 |
$0.087 |
$0.144 |
No free quota |
FAQ
-
Where are files stored after being uploaded through the OpenAI compatible file interface?
Files uploaded via the OpenAI compatible interface are stored free in your Model Studio bucket. To query and manage files, see OpenAI file interface.
-
When uploading using the file URL method, what are the differences between the file_parsing_strategy parameter options?
"auto": automatically parses based on content. "text_only": parses text only. "text_and_images": parses both images and text (increases parsing time).
-
How can I determine if a file has finished parsing?
Try starting a conversation with the file ID. If the file is still parsing, the API returns
File parsing in progress, please try again later.-- retry after a delay. If the call succeeds, the file is ready. -
Does the parsing process after file upload incur any extra costs?
Document parsing is free of charge.
API reference
For the input and output parameters of Qwen-Doc-Turbo, see Qwen API reference.
Error codes
If the model call fails and returns an error message, see Error messages for resolution.
Limitations
-
SDK dependencies:
-
File URL (doc_url): Supports only DashScope protocol. Use the
DashScope Python SDKor HTTP calls (like curl). -
Upload file (file-id): Must use an
OpenAI-compatible SDK for upload and management.
-
-
File upload and reference:
-
File URL (
doc_url): Up to 10 URLs per request. URLs must be publicly accessible. -
Upload file (
file-id): Max 150 MB per file. Account limits: 10,000 files or 100 GB total (files never expire). Each request references one file only.Upload requests fail when limits are reached. Delete unneeded files to free quota. See OpenAI compatible - File for details.
-
Supported formats: TXT, DOC, DOCX, PDF, XLS, XLSX, MD, PPT, PPTX, JPG, JPEG, PNG, GIF, and BMP.
-
-
API input:
-
Using
doc_urlorfile-id: max 262,144 tokens. -
Plain text in
user/systemmessages: max 9,000 tokens per message.
-
-
API output:
-
The maximum output length is 32,768 tokens.
-
-
File sharing:
-
file-idworks only within the generating account -- not across accounts or with RAM user API keys.
-
-
Rate limit: See Rate limits.