All Products
Search
Document Center

Alibaba Cloud Model Studio:Batch processing

Last Updated:Dec 04, 2025

The Batch API is designed for scenarios that do not require real-time responses. It processes large volumes of data requests asynchronously, costs only 50% of the price of real-time responses, and is OpenAI compatible. This makes it ideal for batch jobs such evaluations and labeling large-scale data.

Workflow

Asynchronous batch processing:

  1. Submit a task: Upload a file that contains multiple requests to create a batch task.

  2. Asynchronous processing: The system processes tasks from a queue in the background. You can query the task progress and status in the console or using the API.

  3. Download the results: After the task is complete, the system generates a result file with successful responses and an error file with details about any failures.

Availability

Beijing region

Supported models:

  • Text generation models: Stable and some latest versions of Qwen Max, Plus, Flash, and Long. Also supports the QwQ series (qwq-plus) and third-party models such as deepseek-r1 and deepseek-v3.

  • Multimodal models: Stable and some latest versions of Qwen VL Max, Plus, and Flash. Also supports the Qwen OCR model.

  • Text embedding models: The text-embedding-v4 model.

List of supported model names

Singapore region

Supported models: qwen-max, qwen-plus, and qwen-turbo.

Getting started

Step 1: Prepare your batch file

Prepare a UTF-8 encoded .jsonl file that meets the following requirements:

  • Format: One JSON object per line, each describing an individual request.

  • Size limit: Up to 50,000 requests per file and no larger than 500 MB.

    For files that exceed these limits, split them into smaller batches.
  • Line limit: Each JSON object up to 6 MB and within the model's context window.

  • Consistency: All requests in a file must target the same API endpoint (url) and use the same model (body.model).

  • Unique identifier: Each request requires a custom_id unique within the file, which can be used to reference results after completion.

Request example

The following sample contains 2 requests sent to Qwen-Max:

{"custom_id":"1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-max","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-max","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}

JSONL batch generation tool

Use this tool to quickly generate JSONL files. To avoid performance issues, do not process more than 10,000 rows at a time. If you have a large data volume, process the data in batches.

JSONL Batch Generation Tool
Please select a region:

Step 2: Create the batch

Create and manage batch tasks through the console or the Batch API.

Console

(1) Create a batch

  1. On the Batches page, click Create Batch Task.

  2. In the dialog box that appears, enter a Task Name and Task Description. Set the Maximum Waiting Time (from 1 to 14 days) and upload the JSONL file.

    Click Download Sample File for a template.

    image

  3. Click Confirm.

(2) View and manage batches

  • View:

    • The task list page show the Progress (processed requests/total requests) and Status of each batch.

    • To quickly find a batch, search by task name or ID, or filter by workspace.image

  • Manage:

    • Cancel: Cancel tasks with the `in_progress` status in the Actions column.

    • Troubleshoot: For tasks with the `failed` status, hover over the status to view a summary. Download the error file to view the details.image

(3) Download and analyze the results

After a task is complete, click View Results to download the output files:image

  • Result file: Contains all successful requests and their response results.

  • Error file (if any): Contains all failed requests and their error details.

Both files contain the custom_id field. Use it to match the results with the original input data to correlate results or locate errors.

API

For production environments that require automation and integration, use the OpenAI-compatible Batch API. Core workflow:

  1. Create a batch
    Call the POST /v1/batches endpoint to create a task and record the returned batch_id.

  2. Poll the status
    Use the batch_id to poll the GET /v1/batches/{batch_id} endpoint. When the status field changes to completed, record the returned output_file_id and stop polling.

  3. Download the results
    Use the output_file_id to call the GET /v1/files/{output_file_id}/content endpoint to download the result file.

For API definitions, parameters, and code examples, see Batch API reference.

Step 3: View data statistics (Optional)

On the Model Observation page, filter and view usage statistics for batches.

  1. View data overview: Select a Time range (up to 30 days). Set Inference Type to Batch Inference:

    • Monitoring data: Aggregate statistics for all models within the time range, such as total calls and total failures.

    • Models: Detailed data for each model, such as total calls, failure rate, and average call duration.

    image

    To view inference data from more than 30 days ago, go to the Bills page.
  2. View model details: In the Models list, click Monitor in the Actions column for a specific model to view its Call Statistics details, such as the total call count and usage.image

Important
  • Batch usage data is recorded at the task end time and may have a certain delay of 1 to 2 hours. in_progress tasks is unavailable until the it completes.

  • Monitoring data has a delay of 1 to 2 hours.

Status of a batch

  • validating: The batch file is being validated against the JSONL specification and the API format requirements.

  • in_progress: The batch file has been validated and is being processed.

  • completed: The batch has completed. The output and error files are ready for download.

  • failed: The batch file has failed the validation process. This is typically caused by file-level errors, such as an invalid JSONL format or an oversized file. No requests are processed, and no output file is generated.

  • expired: The batch was not able to be completed within the maximum waiting time set at creation. Set a longer waiting time.

  • cancelled: The batch has been cancelled. Unprocessed requests are terminated.

Billing

  • Unit price: The input and output tokens for successful requests are billed at 50% of the standard synchronous API for that model. Pricing details: Models.

  • Scope:

    • Only successfully executed requests in a task are billed.

    • File parsing failures, execution failures, or row-level request errors do not incur charges.

    • For canceled tasks, requests that successfully completed before the cancellation are still billed.

Important

Batches are billed separately and does not support savings plan, new user free quotas, or features such as context cache.

FAQ

  1. Do I need to purchase or enable anything extra to use batch inference?

    No. Once Alibaba Cloud Model Studio is activated, you can call the Batch API with your API Key. Usage is billed pay-as-you-go and deducted from your account balance.

  2. Why does my task fail immediately after submission (status changes to failed)?

    This is usually caused by a file-level error. Check the following:

    • Format: The file must be in the strict JSONL format, with one complete JSON object per line.

    • Size: The file size and line count must not exceed the limits in Step 1: Prepare your batch file.

    • Model consistency: body.model must identical for all requests in the file. The model and region must supported batches.

  3. How long does it take to process a task?

    It depends on system load. Under heavy load, batches may wait in a queue for resources. Results are returned within the maximum waiting time you set, no matter the batch succeeds or fails.

    Error codes

    If a call fails and returns an error message, see Error messages for solutions.