Alibaba Cloud Model Studio provides an OpenAI-compatible Batch File API. Submit tasks in bulk using files. The system processes them asynchronously and returns results when tasks complete or reach the maximum wait time. Costs are only 50% of real-time calls. This is ideal for data analytics, model evaluation, and other large-scale workloads where latency is not critical.
To use the console, see Batch inference.
Workflow
Prerequisites
You can call the Batch File API using the OpenAI SDK (Python, Node.js) or HTTP API.
Get an API key: Get and configure your Model Studio API key as an environment variable
Install SDK (optional): Install the OpenAI SDK if you plan to use it.
Service endpoints
Chinese mainland:
https://dashscope.aliyuncs.com/compatible-mode/v1International:
https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Availability
International
In the international deployment mode, both the endpoint and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled across global regions, excluding the Chinese mainland.
Supported models: qwen-max, qwen-plus, qwen-flash, qwen-turbo.
Chinese mainland
In the Chinese mainland deployment mode, both the endpoint and data storage are located in the Beijing region. Model inference compute resources are available only in the Chinese mainland.
Supported models:
Text generation models: Stable versions of Qwen-Max, Plus, Flash, and Long, and some
latestversions. QwQ series (qwq-plus) and some third-party models (deepseek-r1, deepseek-v3) are also supported.Multimodal models: Stable versions of Qwen-VL-Max, Plus, and Flash, plus some
latestversions. Also supported is the Qwen-OCR model.Text embedding models: text-embedding-v4.
Some models support thinking mode. When enabled, this mode generates thinking
tokensand increases costs.The
qwen3.5series (such asqwen3.5-plusandqwen3.5-flash) enable thinking mode by default. When using hybrid-thinking models, explicitly set theenable_thinkingparameter (trueorfalse).
Getting started
Before processing formal tasks, test with the batch-test-model. This test model skips inference and returns a fixed success response to verify your API call chain and data format.
Limitations of batch-test-model:
Your test file must meet Input file requirements. It must be no larger than 1 MB and contain no more than 100 lines.
Concurrency limit: Up to 2 parallel tasks.
Cost: The test model does not incur any model inference fees.
Step 1: Prepare the input file
Prepare a file named test_model.jsonl with the following content:
{"custom_id":"1","method":"POST","url":"/v1/chat/ds-test","body":{"model":"batch-test-model","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello! How can I help you?"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/ds-test","body":{"model":"batch-test-model","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}Multimodal models (e.g., qwen-vl-plus) support file URLs and Base64-encoded inputs:
{"custom_id":"image-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},{"type":"text","text":"Describe this image."}]}]}}
{"custom_id":"image-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEA8ADwAAD..."}},{"type":"text","text":"Describe this image."}]}]}}Step 2: Run the code
Select a sample code snippet for your programming language. Save it in the same directory as your input file and run it. The code handles the full workflow: upload, create task, poll status, and download results.
To adjust the file path or other parameters, modify the code as needed.
Step 3: Verify test results
After the task succeeds, the result file result.jsonl will contain the fixed response {"content":"This is a test result."}:
{"id":"a2b1ae25-21f4-4d9a-8634-99a29926486c","custom_id":"1","response":{"status_code":200,"request_id":"a2b1ae25-21f4-4d9a-8634-99a29926486c","body":{"created":1743562621,"usage":{"completion_tokens":6,"prompt_tokens":20,"total_tokens":26},"model":"batch-test-model","id":"chatcmpl-bca7295b-67c3-4b1f-8239-d78323bb669f","choices":[{"finish_reason":"stop","index":0,"message":{"content":"This is a test result."}}],"object":"chat.completion"}},"error":null}
{"id":"39b74f09-a902-434f-b9ea-2aaaeebc59e0","custom_id":"2","response":{"status_code":200,"request_id":"39b74f09-a902-434f-b9ea-2aaaeebc59e0","body":{"created":1743562621,"usage":{"completion_tokens":6,"prompt_tokens":20,"total_tokens":26},"model":"batch-test-model","id":"chatcmpl-1e32a8ba-2b69-4dc4-be42-e2897eac9e84","choices":[{"finish_reason":"stop","index":0,"message":{"content":"This is a test result."}}],"object":"chat.completion"}},"error":null}Run a formal task
Input file requirements
Format: UTF-8 encoded JSONL (one independent JSON object per line).
Size limits: Up to 50,000 requests per file and no larger than 500 MB.
Line limit: Each JSON object up to 6 MB and within the model's context window.
Consistency: All requests in the same file must use the same model and thinking mode (if applicable).
Unique identifier: Each request must include a unique custom_id field within the file. This is used to match results.
Scenario 1: Text chat
Example file content:
{"custom_id":"1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}Scenario 2: Image and video understanding
Multimodal models (e.g., qwen-vl-plus) support file URLs and Base64-encoded inputs.
{"custom_id":"image-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},{"type":"text","text":"Describe this image."}]}]}}
{"custom_id":"image-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEA8ADwAAD..."}},{"type":"text","text":"Describe this image."}]}]}}
{"custom_id":"video-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"video","video":"https://example.com/video.mp4"},{"type":"text","text":"Describe this video."}]}]}}
{"custom_id":"video-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"video","video":["data:image/jpeg;base64,{frame1}","data:image/jpeg;base64,{frame2}","data:image/jpeg;base64,{frame3}"]},{"type":"text","text":"Describe this video."}]}]}}
{"custom_id":"multi-image-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://example.com/image1.jpg"}},{"type":"image_url","image_url":{"url":"https://example.com/image2.jpg"}},{"type":"text","text":"Compare these two images."}]}]}}
{"custom_id":"multi-image-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,{image1_base64}"}},{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,{image2_base64}"}},{"type":"text","text":"Compare these two images."}]}]}}The Base64 strings in the examples above are truncated. Generate full encodings using the Python code below.
For full details (including file limits, MIME types, and encoding methods), see Pass local files (Base64 encoding or file paths).
JSONL batch generation tool
Use this tool to quickly generate JSONL files.
1. Modify the input file
In the
test_model.jsonlfile, set the `model` parameter to the model to use and set the `url` field:Model type
url
Text generation/multimodal models
/v1/chat/completionsText embedding models
/v1/embeddingsOr use the "JSONL batch generation tool" above to generate a new file for formal tasks. Ensure the
modelandurlfields are correct.
2. Modify the Getting Started code
Change the input file path to your file name.
Set the endpoint parameter to match the url field in your input file.
3. Run the code and wait for results
When the task is complete, the results of successful requests are saved to the local result.jsonl file. If any requests failed, the error details are saved to the error.jsonl file.
Successful results (
output_file_id): Each line corresponds to one successful original request and includes thecustom_idandresponse.{"id":"3a5c39d5-3981-4e4c-97f2-e0e821893f03","custom_id":"req-001","response":{"status_code":200,"request_id":"3a5c39d5-3981-4e4c-97f2-e0e821893f03","body":{"created":1768306034,"usage":{"completion_tokens":654,"prompt_tokens":14,"total_tokens":668},"model":"qwen-plus","id":"chatcmpl-3a5c39d5-3981-4e4c-97f2-e0e821893f03","choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Hello! Hangzhou West Lake is a famous scenic spot in China, located in the western part of Hangzhou City, Zhejiang Province, hence the name \"West Lake\". It is one of China's top ten scenic spots and a World Cultural Heritage site (listed by UNESCO in 2011). It is renowned worldwide for its beautiful natural scenery and profound cultural heritage.\n\n### I. Natural Landscape\nWest Lake is surrounded by mountains on three sides and borders the city on one side, covering an area of approximately 6.39 square kilometers, shaped like a ruyi scepter with rippling blue waters. The lake is naturally or artificially divided into multiple water areas by Solitary Hill, Bai Causeway, Su Causeway, and Yanggong Causeway, forming a layout of \"one mountain, two pagodas, three islands, and three causeways\".\n\nMain attractions include the following:\n- **Spring Dawn at Su Causeway**: During the Northern Song Dynasty, the great literary figure Su Dongpo, while serving as the prefect of Hangzhou, led the dredging of West Lake and used the excavated silt to build a causeway, later named \"Su Causeway\". In spring, peach blossoms and willows create a picturesque scene.\n- **Lingering Snow on Broken Bridge**: Located at the eastern end of Bai Causeway, this is where the reunion scene from the Legend of the White Snake took place. After snowfall in winter, it is particularly famous for its silver-white appearance.\n- **Leifeng Pagoda at Sunset**: Leifeng Pagoda glows golden under the setting sun and was once one of the \"Ten Scenes of West Lake\".\n- **Three Pools Mirroring the Moon**: On Xiaoyingzhou Island in the lake, there are three stone pagodas. During the Mid-Autumn Festival, lanterns can be lit inside the pagodas, creating a harmonious interplay of moonlight, lamplight, and lake reflections.\n- **Autumn Moon over Calm Lake**: Located at the western end of Bai Causeway, it is an excellent spot for viewing the moon over the lake.\n- **Viewing Fish at Flower Harbor**: Known for viewing flowers and fish, with peonies and koi complementing each other beautifully in the garden.\n\n### II. Cultural History\nWest Lake not only boasts beautiful scenery but also carries rich historical and cultural significance:\n- Since the Tang and Song dynasties, numerous literati such as Bai Juyi, Su Dongpo, Lin Bu, and Yang Wanli have left poems here.\n- Bai Juyi oversaw the construction of \"Bai Causeway\" and dredged West Lake, benefiting the local people.\n- Around West Lake are many historical sites, including Yuewang Temple (commemorating national hero Yue Fei), Lingyin Temple (a millennium-old Buddhist temple), Liuhe Pagoda, and Longjing Village (the origin of Longjing tea, one of China's top ten famous teas).\n\n### III. Cultural Symbolism\nWest Lake is regarded as a representative of \"paradise on earth\" and a model of traditional Chinese landscape aesthetics. It embodies the philosophical concept of \"harmony between heaven and humanity\" by integrating natural beauty with cultural depth. Many poems, paintings, and operas feature West Lake, making it an important symbol of Chinese culture.\n\n### IV. Travel Recommendations\n- Best visiting seasons: Spring (March-May) for peach blossoms and willows, Autumn (September-November) for clear skies and cool weather.\n- Recommended ways: Walking, cycling (along the lakeside greenway), or boating on the lake.\n- Local cuisine: West Lake vinegar fish, Longjing shrimp, Dongpo pork, pian'erchuan noodles.\n\nIn summary, Hangzhou West Lake is not just a natural wonder but also a living cultural museum worth exploring in detail. If you ever visit Hangzhou, don't miss this earthly paradise that is \"equally charming in light or heavy makeup\"."}}],"object":"chat.completion"}},"error":null} {"id":"628312ba-172c-457d-ba7f-3e5462cc6899","custom_id":"req-002","response":{"status_code":200,"request_id":"628312ba-172c-457d-ba7f-3e5462cc6899","body":{"created":1768306035,"usage":{"completion_tokens":25,"prompt_tokens":18,"total_tokens":43},"model":"qwen-plus","id":"chatcmpl-628312ba-172c-457d-ba7f-3e5462cc6899","choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"The spring breeze brushes green willows,\nNight rain nourishes red flowers.\nBird songs fill the forest,\nMountains and rivers share the same beauty."}}],"object":"chat.completion"}},"error":null}Failure details (
error_file_id): Contains information about failed request lines and error reasons. See Error codes for troubleshooting.
Detailed procedure
The Batch API process consists of four steps: upload file, create task, query task status, and download results.
1. Upload file
2. Create a batch task
3. Query and manage batch tasks
4. Download Batch result file
Advanced features
Set completion notifications
For long-running tasks, use asynchronous notifications instead of polling to save resources.
Completion notification is only supported in the Beijing region.
Callback: Specify a publicly accessible URL when creating the task.
EventBridge message queue: Deeply integrated with the Alibaba Cloud ecosystem and requires no public IP.
Method 1: Callback
Method 2: EventBridge message queue
Going live
File management
Periodically delete unnecessary files using the OpenAI File delete API to avoid hitting storage limits (10,000 files or 100 GB).
Store large files in OSS instead.
Task monitoring
Prefer using Callback or EventBridge asynchronous notifications.
If polling is required, set the interval to more than 1 minute and use an exponential backoff strategy.
Error handling
Implement comprehensive exception handling covering network errors, API errors, and other potential issues.
Download and analyze error details in
error_file_id.For common error codes, see Error messages for solutions.
Cost optimization
Combine small tasks into a single batch.
Set
completion_windowappropriately to provide greater scheduling flexibility.
Utility tools
CSV to JSONL
JSONL results to CSV
Rate limits
Interface | Rate limit (per Alibaba Cloud account) |
Create task | 1,000 calls/minute, up to 1,000 concurrent tasks |
Query task | 1,000 calls/minute |
Query task list | 100 calls/minute |
Cancel task | 1,000 calls/minute |
Billing
Unit price: The input and output tokens for all successful requests are charged at 50% of the real-time inference price for the corresponding model. For more information, see Model list.
Billing scope:
Only requests successfully executed within a task are billed.
Requests that fail because of file parsing errors, task execution failures, or row-level errors do not incur charges.
For canceled tasks, requests successfully completed before the cancellation are still billed as normal.
Batch inference is a separate billing item. It supports AI General-purpose Savings Plan, but not discounts, such as subscription (other savings plans) or free quotas for new users. It also does not support features such as context cache.
Some models, such as qwen3.5-plus and qwen3.5-flash, have thinking mode enabled by default. This mode generates additional thinking tokens, which are billed at the output token price and increase costs. To control costs, set the `enable_thinking` parameter based on task complexity. For more information, see Deep thinking.
Error codes
If a call fails and returns an error message, see Error messages for a solution.
FAQ
How do I choose between Batch Chat and Batch File?
Use Batch File when you need to process a single large file with many requests asynchronously. Use Batch Chat when business logic requires submitting many independent conversation requests synchronously with high concurrency.
How is the Batch File API billed? Do I need to purchase a separate package?
Batch uses pay-as-you-go billing based on tokens in successful requests. No package purchase required.
Are submitted batch files executed in order?
No. The system uses dynamic scheduling based on compute load and doesn't guarantee order. Tasks may be delayed when resources are constrained.
How long does it take to complete a submitted batch file?
Execution time depends on system resources and task scale. If a task does not complete within the completion_window, it expires. Unprocessed requests in expired tasks are not executed and incur no charges.
Scenario recommendations: Use real-time calls for scenarios that require strict real-time model inference. Use batch calls for large-scale data processing scenarios that can tolerate some delay.