All Products
Search
Document Center

Alibaba Cloud Model Studio:Data mining (Qwen-Doc-Turbo)

Last Updated:Nov 11, 2025

The data mining model is designed for information extraction, content moderation, classification, and summary generation. Unlike general-purpose conversational models, this model quickly and accurately outputs standardized structured data, such as in JSON format. This addresses the issue of general-purpose models returning non-standard response structures or extracting information inaccurately.

Note

This document applies only to the China (Beijing) region. You must use an API key from the China (Beijing) region.

Methods

Qwen-Doc-Turbo supports information extraction from files using the following three methods:

  • Pass a file URL (Recommended):

    • Provide a public URL for the file directly in the API request, which allows the model to access and parse the content. This method supports passing up to 10 files at a time and is the only method that supports processing multiple files. You can specify the parsing policy (auto, text_only, or text_and_images) using the file_parsing_strategy parameter.

    • SDK: The file URL method currently supports only the DashScope protocol. You can use the DashScope Python SDK or make an HTTP call, such as using curl.

  • Pass a file ID:

    • Upload a local file to Model Studio to generate a unique file-id for your Alibaba Cloud account and start the parsing process. Then, reference this ID in subsequent API requests. This method is compatible with the OpenAI SDK. It is suitable for scenarios where you need to reuse the same file or process local files.

    • SDK: Use the OpenAI SDK for file uploads and management. Model calls are compatible with both the OpenAI SDK and the DashScope SDK.

  • Pass plain text:

    • For short or temporary text content, pass it directly as part of a system message.

    • SDK: Compatible with the OpenAI SDK and the DashScope SDK.

Prerequisites

Pass a file URL

Extract structured data directly using file URLs. You can process up to 10 files in a single request. This example uses the customer_feedback_report.txt file.

The file URL method currently supports only the DashScope protocol. You can use the DashScope Python SDK or make an HTTP call, such as using curl.
import os
import dashscope

response = dashscope.Generation.call(
    api_key=os.getenv('DASHSCOPE_API_KEY'), # If you have not configured the environment variable, replace this with your API key
    model='qwen-doc-turbo',
    messages=[
    {"role": "system","content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters)."
            },
            {
                "type": "doc_url",
                "doc_url": [
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250910/gokhyx/%E5%AE%A2%E6%88%B7%E5%8F%8D%E9%A6%88%E6%8A%A5%E5%91%8A.txt"
                ],
                "file_parsing_strategy": "auto"
            }
        ]
    }]
)
try:
    if response.status_code == 200:
        print(response.output.choices[0].message.content)
    else:
        print(f"Request failed, status code: {response.status_code}")
        print(f"Error code: {response.code}")
        print(f"Error message: {response.message}")
        print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
except Exception as e:
    print(f"An error occurred: {e}")
    print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'X-DashScope-SSE: enable' \
--data '{
    "model": "qwen-doc-turbo",
    "input": {
        "messages": [
                {
                    "role": "system",
                    "content": "you are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters)."
                        },
                        {
                            "type": "doc_url",
                            "doc_url": [
                                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250910/gokhyx/%E5%AE%A2%E6%88%B7%E5%8F%8D%E9%A6%88%E6%8A%A5%E5%91%8A.txt"
                            ],
                            "file_parsing_strategy": "auto"
                        }
                    ]
                }
            ]
    }
}'

Pass a file ID

Upload a file

This example uses the customer_feedback_report.txt file. Upload the file to the secure bucket in Alibaba Cloud Model Studio through the OpenAI compatible interface to obtain the returned file-id. For more information about the parameters and call methods for the file upload interface, see the API reference.

Python

import os
from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured the environment variable, replace this with your API key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",  # Enter the DashScope service base_url
)

file_object = client.files.create(file=Path("customer_feedback_report.txt"), purpose="file-extract")
# Print the file-id for use in subsequent model conversations
print(file_object.id)

Java

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.files.*;

import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) {
        // Create a client and use the API key from the environment variable
        OpenAIClient client = OpenAIOkHttpClient.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();
        // Set the file path. Modify the path and file name as needed.
        Path filePath = Paths.get("src/main/java/org/example/customer_feedback_report.txt");
        // Create file upload parameters
        FileCreateParams fileParams = FileCreateParams.builder()
                .file(filePath)
                .purpose(FilePurpose.of("file-extract"))
                .build();

        // Upload the file and print the file-id
        FileObject fileObject = client.files().create(fileParams);
        // Print the file-id for use in subsequent model conversations
        System.out.println(fileObject.id());
    }
}

curl

curl --location --request POST 'https://dashscope.aliyuncs.com/compatible-mode/v1/files' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --form 'file=@"customer_feedback_report.txt"' \
  --form 'purpose="file-extract"'

Run the code to obtain the file-id for the uploaded file.

Pass information and start a conversation using a file ID

Embed the obtained file-id into a system message. The first system message sets the role for the model. The subsequent system message passes the file-id. The user message contains the specific query about the file.

import os
from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace this with your API key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

try:
    completion = client.chat.completions.create(
        model="qwen-doc-turbo",
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            # Replace '{FILE_ID}' with the file-id used in your actual conversation scenario
            {'role': 'system', 'content': 'fileid://{FILE_ID}'},
            {'role': 'user', 'content': 'From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters).'}
        ],
        # This code example uses streaming output to clearly and intuitively show the model's output process. For non-streaming output examples, see https://www.alibabacloud.com/help/en/model-studio/user-guide/text-generation
        stream=True,
        stream_options={"include_usage": True}
    )

    full_content = ""
    for chunk in completion:
        if chunk.choices and chunk.choices[0].delta.content:
            full_content += chunk.choices[0].delta.content
            print(chunk.model_dump())
    
    print(full_content)

except BadRequestError as e:
    print(f"Error message: {e}")
    print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.chat.completions.*;

public class Main {
    public static void main(String[] args) {
        // Create a client and use the API key from the environment variable
        OpenAIClient client = OpenAIOkHttpClient.builder()
                // If you have not configured the environment variable, replace the next line with your Alibaba Cloud Model Studio API key: .apiKey("sk-xxx");
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();

        ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
                .addSystemMessage("You are a helpful assistant.")
                // Replace '{FILE_ID}' with the file-id used in your actual conversation scenario
                .addSystemMessage("fileid://{FILE_ID}")
                .addUserMessage("From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters).")
                .model("qwen-doc-turbo")
                .build();

        try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
            streamResponse.stream().forEach(chunk -> {
                String content = chunk.choices().get(0).delta().content().orElse("");
                if (!content.isEmpty()) {
                    System.out.print(content);
                }
            });
        } catch (Exception e) {
            System.err.println("Error message: " + e.getMessage());
        }
    }
}
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model": "qwen-doc-turbo",
    "messages": [
        {"role": "system","content": "You are a helpful assistant."},
        {"role": "system","content": "fileid://{FILE_ID}"},
        {"role": "user","content": "From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters)."}
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    }
}'

Pass plain text

In addition to passing file information using a file-id, you can also pass the file content directly as a string. When using this method, to prevent the model from confusing the role setting with the file content, ensure that the role-setting information is in the first message of the messages array.

Because of API request body size limits, if your text content exceeds 9,000 tokens, pass the information using a file ID.
import os
from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace this with your API key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

try:
    completion = client.chat.completions.create(
        model="qwen-doc-turbo",
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'system', 'content': 'Feedback ID: 001 User: Zhang Wei (vip_zhang@example.com) Product: Model Studio AI-Writer Pro...'},
            {'role': 'user', 'content': 'From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters).'}
        ],
        # This code example uses streaming output to clearly and intuitively show the model's output process. For non-streaming output examples, see https://www.alibabacloud.com/help/en/model-studio/user-guide/text-generation
        stream=True,
        stream_options={"include_usage": True}
    )

    full_content = ""
    for chunk in completion:
        if chunk.choices and chunk.choices[0].delta.content:
            full_content += chunk.choices[0].delta.content
            print(chunk.model_dump())
    
    print(full_content)

except BadRequestError as e:
    print(f"Error message: {e}")
    print("For more information, see https://www.alibabacloud.com/help/en/model-studio/developer-reference/error-codes")
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.chat.completions.*;

public class Main {
    public static void main(String[] args) {
        // Create a client and use the API key from the environment variable
        OpenAIClient client = OpenAIOkHttpClient.builder()
                // If you have not configured the environment variable, replace the next line with your Alibaba Cloud Model Studio API key: .apiKey("sk-xxx");
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();

        ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
                .addSystemMessage("You are a helpful assistant.")
                .addSystemMessage("Feedback ID: 001 User: Zhang Wei (vip_zhang@example.com) Product: Model Studio AI-Writer Pro...")
                .addUserMessage("From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters).")
                .model("qwen-doc-turbo")
                .build();

        try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
            streamResponse.stream().forEach(chunk -> {
                String content = chunk.choices().get(0).delta().content().orElse("");
                if (!content.isEmpty()) {
                    System.out.print(content);
                }
            });
        } catch (Exception e) {
            System.err.println("Error message: " + e.getMessage());
        }
    }
}
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model": "qwen-doc-turbo",
    "messages": [
        {"role": "system","content": "You are a helpful assistant."},
        {"role": "system","content": "Feedback ID: 001 User: Zhang Wei (vip_zhang@example.com) Product: Model Studio AI-Writer Pro..."},
        {"role": "user","content": "From this customer feedback report, extract all feedback information and organize it into a standard JSON array. Each object must include the following: feedback_id (string), product_name (string), user_name (string), rating_score (an integer from 1 to 5), feedback_type (string), and summary (a Chinese summary of no more than 30 characters)."}
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    }
}'

Model pricing

Model

Context window

Max input

Max output

Input cost

Output cost

Free quota

(Tokens)

(Million tokens)

qwen-doc-turbo

262,144

253,952

8,192

$0.087

$0.144

No free quota

FAQ

  1. Where are files stored after being uploaded through the OpenAI compatible file interface?

    All files uploaded through the OpenAI compatible file interface are stored free of charge in the Alibaba Cloud Model Studio bucket under your Alibaba Cloud account. For more information about how to query and manage uploaded files, see OpenAI file interface.

  2. When uploading using the file URL method, what are the differences between the file_parsing_strategy parameter options?

    When the parsing strategy is set to "auto", the system automatically parses the file based on its content. When set to "text_only", the system parses only text content. When set to "text_and_images", the system parses all images and text content, which increases the parsing time.

  3. How can I determine if a file has finished parsing?

    After you obtain a file-id, you can try to start a conversation with the model using that file-id. If the file has not finished parsing, the system returns error code 400 with the message "File parsing in progress, please try again later." If the model call is successful and returns a response, the file has finished parsing.

  4. Does the parsing process after file upload incur any extra costs?

    Document parsing is free of charge.

API reference

For the input and output parameters of Qwen-Doc-Turbo, see Qwen API reference.

Error codes

If a call fails, see Error messages for troubleshooting.

Limitations

  • SDK dependencies:

    • File URL (doc_url): The file URL method currently supports only the DashScope protocol. You can use the DashScope Python SDK or make an HTTP call, such as using curl.

    • Upload file (file-id): File upload and management operations must use an OpenAI compatible SDK.

  • File upload and reference:

    • File URL (doc_url): The URL of the file. A single request can include up to 10 file URLs.

    • Upload file (file-id): The maximum size of a single file is 150 MB. Each Alibaba Cloud account is limited to 10,000 uploaded files, with a total size of up to 100 GB. Uploaded files do not expire. Each request can reference only one file.

    • Supported formats: TXT, DOC, DOCX, PDF, XLS, XLSX, MD, PPT, PPTX, JPG, JPEG, PNG, GIF, and BMP.

  • API input:

    • When passing information using doc_url or a file-id, the maximum context length is 262,144 tokens.

    • When entering plain text directly in a user or system message, the content of a single message is limited to 9,000 tokens.

  • API output:

    • The maximum output length is 8,192 tokens.

  • File sharing:

    • A file-id is valid only within the Alibaba Cloud account that generated it. It cannot be used across accounts or called using the API key of a RAM user.

  • Rate limiting: See Rate limits.