The Batch API is designed for scenarios that do not require real-time responses. It processes large volumes of data requests asynchronously, costs only 50% of the price of real-time responses, and is OpenAI compatible. This makes it ideal for batch jobs such evaluations and labeling large-scale data.
Workflow
Asynchronous batch processing:
Submit a task: Upload a file that contains multiple requests to create a batch task.
Asynchronous processing: The system processes tasks from a queue in the background. You can query the task progress and status in the console or using the API.
Download the results: After the task is complete, the system generates a result file with successful responses and an error file with details about any failures.
Availability
Beijing region
Supported models:
Text generation models: Stable and some
latestversions of Qwen Max, Plus, Flash, and Long. Also supports the QwQ series (qwq-plus) and third-party models such as deepseek-r1 and deepseek-v3.Multimodal models: Stable and some
latestversions of Qwen VL Max, Plus, and Flash. Also supports the Qwen OCR model.Text embedding models: The text-embedding-v4 model.
Singapore region
Supported models: qwen-max, qwen-plus, and qwen-turbo.
Getting started
Step 1: Prepare your batch file
Prepare a UTF-8 encoded .jsonl file that meets the following requirements:
Format: One JSON object per line, each describing an individual request.
Size limit: Up to 50,000 requests per file and no larger than 500 MB.
For files that exceed these limits, split them into smaller batches.
Line limit: Each JSON object up to 6 MB and within the model's context window.
Consistency: All requests in a file must target the same API endpoint (
url) and use the same model (body.model).Unique identifier: Each request requires a
custom_idunique within the file, which can be used to reference results after completion.
Request example
The following sample contains 2 requests sent to Qwen-Max:
{"custom_id":"1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-max","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-max","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}JSONL batch generation tool
Use this tool to quickly generate JSONL files. To avoid performance issues, do not process more than 10,000 rows at a time. If you have a large data volume, process the data in batches.
Step 2: Create the batch
Create and manage batch tasks through the console or the Batch API.
Console
(1) Create a batch
On the Batches page, click Create Batch Task.
In the dialog box that appears, enter a Task Name and Task Description. Set the Maximum Waiting Time (from 1 to 14 days) and upload the JSONL file.
Click Download Sample File for a template.

Click Confirm.
(2) View and manage batches
View:
The task list page show the Progress (processed requests/total requests) and Status of each batch.
To quickly find a batch, search by task name or ID, or filter by workspace.

Manage:
Cancel: Cancel tasks with the `in_progress` status in the Actions column.
Troubleshoot: For tasks with the `failed` status, hover over the status to view a summary. Download the error file to view the details.

(3) Download and analyze the results
After a task is complete, click View Results to download the output files:
Result file: Contains all successful requests and their
responseresults.Error file (if any): Contains all failed requests and their
errordetails.
Both files contain the custom_id field. Use it to match the results with the original input data to correlate results or locate errors.
API
For production environments that require automation and integration, use the OpenAI-compatible Batch API. Core workflow:
Create a batch
Call thePOST /v1/batchesendpoint to create a task and record the returnedbatch_id.Poll the status
Use thebatch_idto poll theGET /v1/batches/{batch_id}endpoint. When thestatusfield changes tocompleted, record the returnedoutput_file_idand stop polling.Download the results
Use theoutput_file_idto call theGET /v1/files/{output_file_id}/contentendpoint to download the result file.
For API definitions, parameters, and code examples, see Batch API reference.
Step 3: View data statistics (Optional)
On the Model Observation page, filter and view usage statistics for batches.
View data overview: Select a Time range (up to 30 days). Set Inference Type to Batch Inference:
Monitoring data: Aggregate statistics for all models within the time range, such as total calls and total failures.
Models: Detailed data for each model, such as total calls, failure rate, and average call duration.

To view inference data from more than 30 days ago, go to the Bills page.
View model details: In the Models list, click Monitor in the Actions column for a specific model to view its Call Statistics details, such as the total call count and usage.

Batch usage data is recorded at the task end time and may have a certain delay of 1 to 2 hours. in_progress tasks is unavailable until the it completes.
Monitoring data has a delay of 1 to 2 hours.
Status of a batch
validating: The batch file is being validated against the JSONL specification and the API format requirements.
in_progress: The batch file has been validated and is being processed.
completed: The batch has completed. The output and error files are ready for download.
failed: The batch file has failed the validation process. This is typically caused by file-level errors, such as an invalid JSONL format or an oversized file. No requests are processed, and no output file is generated.
expired: The batch was not able to be completed within the maximum waiting time set at creation. Set a longer waiting time.
cancelled: The batch has been cancelled. Unprocessed requests are terminated.
Billing
Unit price: The input and output tokens for successful requests are billed at 50% of the standard synchronous API for that model. Pricing details: Models.
Scope:
Only successfully executed requests in a task are billed.
File parsing failures, execution failures, or row-level request errors do not incur charges.
For canceled tasks, requests that successfully completed before the cancellation are still billed.
Batches are billed separately and does not support savings plan, new user free quotas, or features such as context cache.
FAQ
Do I need to purchase or enable anything extra to use batch inference?
No. Once Alibaba Cloud Model Studio is activated, you can call the Batch API with your API Key. Usage is billed pay-as-you-go and deducted from your account balance.
Why does my task fail immediately after submission (status changes to
failed)?This is usually caused by a file-level error. Check the following:
Format: The file must be in the strict JSONL format, with one complete JSON object per line.
Size: The file size and line count must not exceed the limits in Step 1: Prepare your batch file.
Model consistency:
body.modelmust identical for all requests in the file. The model and region must supported batches.
How long does it take to process a task?
It depends on system load. Under heavy load, batches may wait in a queue for resources. Results are returned within the maximum waiting time you set, no matter the batch succeeds or fails.
Error codes
If a call fails and returns an error message, see Error messages for solutions.