All Products
Search
Document Center

Alibaba Cloud Model Studio:Batch inference

Last Updated:Apr 22, 2025

For business scenarios that do not require real-time responses, batch inference can process large-scale data offline at 50% the cost of real-time inference.

Supported models

Text generation models: qwen-max, qwen-plus, qwen-turbo

How to use

1. Create batch inference task

On the Batch Inference page, click Create Batch Task in the upper right corner.

image

Configure the batch inference task and click OK to submit.

Batch task

Description

image

Upload Data File

Upload a data file containing request information.

  • Only one file can be uploaded at a time.

  • Make sure the file meets the format requirements. You can click Download Sample File for reference.

  • Ensure that the content and format of each line in the file are correct. Otherwise, file parsing errors will affect task execution. Lines in the same file must request the same model. Otherwise, file parsing errors will occur.

  • You can also use format conversion tools or scripts to convert your request file into a JSONL file that meets the requirements. See Convert CSV to JSONL.

2. View and manage batch inference tasks

On the task list page, you can view information about the batch tasks.

  • The Task Progress column displays "Number of processed requests/Total number of requests".

  • Task Status:

    • For tasks with status Executing and Validating, you can click Cancel Task. For Completed tasks, you can click View Results.

      image

    • For Failed tasks, hover over the status to check the error message.

      image

Query by workspace

Batch inference tasks created in each workspace are automatically assigned to the workspace of the creator. You can filter the tasks by workspaces. You can also enter Task Name or Task ID in the Search Box, and click image to start query.

Workspace permission descriptions:

  • All: View tasks from all workspaces.

  • Default Workspace: View tasks only from the default workspace.

  • Name of sub-workspace: View tasks only from the selected workspace.

image

3. Download result file

After a batch inference task is completed, you can click View Results to check and download the result file.

For Completed or Stopped tasks, the results are saved in the result file, and error messages are saved in the error file.

Failed status indicates that file parsing failed. No result file or error file will be returned. Hover over the status to view the error messages and check your uploaded file accordingly.

API reference

Model Studio provides Batch interface compatible with OpenAI. You can use it to execute batch inference tasks, see Batch.

Statistics

Go to Model Observation, select Batch Inference for Inference Type. Select a time period from today to the last 15 days, accurate to the second, to view the statistics of all models during the selected time period.

Important

The model call time for batch inference is based on the end time of the task. Therefore, ongoing tasks will not be displayed here.

image

Click Monitor to the right of a model, select Batch Inference for Inference Type. Select a time period from today to the last 15 days, accurate to the second, to view the trends of Call Statistics and Performance Metrics for that model.

image

Billing

The cost of batch inference is 50% of real-time inference. For specific pricing, see Models.

Batch calling does not support discounts such as free quota or context cache.
In each task, only requests that have been successfully executed are billed. Unexecuted requests are not billed.