For business scenarios that do not require real-time responses, batch inference can process large-scale data offline at 50% the cost of real-time inference.
Supported models
Text generation models: qwen-max, qwen-plus, qwen-turbo
How to use
1. Create batch inference task
On the Batch Inference page, click Create Batch Task in the upper right corner.
Configure the batch inference task and click OK to submit.
Batch task | Description |
Upload Data File Upload a data file containing request information.
| |
2. View and manage batch inference tasks
On the task list page, you can view information about the batch tasks.
The Task Progress column displays "Number of processed requests/Total number of requests".
Task Status:
For tasks with status Executing and Validating, you can click Cancel Task. For Completed tasks, you can click View Results.
For Failed tasks, hover over the status to check the error message.
Query by workspace
Batch inference tasks created in each workspace are automatically assigned to the workspace of the creator. You can filter the tasks by workspaces. You can also enter Task Name or Task ID in the Search Box, and click to start query.
Workspace permission descriptions:
All: View tasks from all workspaces.
Default Workspace: View tasks only from the default workspace.
Name of sub-workspace: View tasks only from the selected workspace.
3. Download result file
After a batch inference task is completed, you can click View Results to check and download the result file.
For Completed or Stopped tasks, the results are saved in the result file, and error messages are saved in the error file.
Failed status indicates that file parsing failed. No result file or error file will be returned. Hover over the status to view the error messages and check your uploaded file accordingly.
API reference
Model Studio provides Batch interface compatible with OpenAI. You can use it to execute batch inference tasks, see Batch.
Statistics
Go to Model Observation, select Batch Inference for Inference Type. Select a time period from today to the last 15 days, accurate to the second, to view the statistics of all models during the selected time period.
The model call time for batch inference is based on the end time of the task. Therefore, ongoing tasks will not be displayed here.
Click Monitor to the right of a model, select Batch Inference for Inference Type. Select a time period from today to the last 15 days, accurate to the second, to view the trends of Call Statistics and Performance Metrics for that model.
Billing
The cost of batch inference is 50% of real-time inference. For specific pricing, see Models.
Batch calling does not support discounts such as free quota or context cache.
In each task, only requests that have been successfully executed are billed. Unexecuted requests are not billed.