OpenAI compatible - Batch Chat - Alibaba Cloud Model Studio

For non-real-time scenarios like data annotation and content generation, the Batch Chat API offers a low-cost, high-concurrency alternative using the same synchronous call method. Limited-time 50% discount available.

This API supports only single-request submissions. To submit multiple requests at once, package them in a file. See OpenAI compatible - Batch (file input).

How it works

Submit request: The client sends a request and establishes a connection.
Queue and wait: The request enters a queue while the client maintains the connection.
Return result: The server returns the complete result over the established connection after processing.
The connection disconnects with a timeout error if the maximum wait time is exceeded.

Availability

Chinese Mainland

In Chinese Mainland deployment mode, endpoint and data storage are in Beijing region. Computing resources limited to Chinese Mainland.

Text generation models: qwen3.5-plus, qwen3.5-flash, qwen3-max, qwen-plus, qwen-flash, deepseek-v3.2
Image and video understanding models: qwen3.5-plus, qwen3.5-flash, qwen3-vl-plus, qwen3-vl-flash

Important

Some models support thinking mode. When enabled, this mode generates thinking tokens and increases costs.
The qwen3.5 series (such as qwen3.5-plus and qwen3.5-flash) enable thinking mode by default. When using hybrid-thinking models, explicitly set the enable_thinking parameter (true or false).

Usage

Prerequisites

Activate Alibaba Cloud Model Studio and get an API key.
Configure the API key as an environment variable to reduce leak risk.
To use OpenAI SDK, install it:
```
pip3 install -U openai
```

Step 1: Configure the API endpoint

Switch from real-time to batch inference by modifying the API endpoint (base_url) based on your call method:

SDK: Set base_url to https://batch.dashscope.aliyuncs.com/compatible-mode/v1

HTTP: POST https://batch.dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Step 2: Make a call

The following examples show how to call the Batch Chat API. Default timeout is 3600 seconds (1 hour); no extra configuration is needed in most cases.

Custom timeout range: 60–3600 seconds.

Python

Request example

import os
from openai import OpenAI

client = OpenAI(
   # If environment variable not set, replace with api_key="sk-xxx".
   # Avoid hard-coding API keys in production to reduce leak risk.
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://batch.dashscope.aliyuncs.com/compatible-mode/v1",  # Batch Chat API endpoint
).with_options(timeout=1800.0) # Timeout: 1800s (30 min). Max: 3600s.

completion = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"},
    ]
)
print(completion.choices[0].message.content)

Response example

I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, and more. I can also express opinions and play games. If you have any questions or need help, feel free to let me know!

Java

Request example

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.chat.completions.ChatCompletion;
import com.openai.models.chat.completions.ChatCompletionCreateParams;

import java.time.Duration;

public class Main {
    public static void main(String[] args) {
        OpenAIClient client = OpenAIOkHttpClient.builder()
                // If environment variable not set, replace with .apiKey("sk-xxx").
                // Avoid hard-coding API keys in production to reduce leak risk.
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .baseUrl("https://batch.dashscope.aliyuncs.com/compatible-mode/v1")  // Batch Chat API endpoint
                .timeout(Duration.ofSeconds(1800)) // Timeout: 1800s (30 min). Max: 3600s.
                .build();

        ChatCompletionCreateParams params = ChatCompletionCreateParams.builder()
                .addUserMessage("Who are you?")
                .model("qwen-plus")
                .build();

        try {
            ChatCompletion chatCompletion = client.chat().completions().create(params);
            System.out.println(chatCompletion);
        } catch (Exception e) {
            System.err.println("Error occurred: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Response example

ChatCompletion{id=chatcmpl-a12c115e-15fc-94f9-a984-81bd65f0527b, choices=[Choice{finishReason=stop, index=0, logprobs=, message=ChatCompletionMessage{
content=I am Qwen, a large-scale language model from Alibaba Group. I can help you answer questions, create text, and provide information query services. It's nice to meet you!, refusal=, role=assistant, annotations=, audio=, functionCall=, toolCalls=, additionalProperties={}}, additionalProperties={}}], created=1763609020, model=qwen-plus, object_=chat.completion, serviceTier=, systemFingerprint=, usage=CompletionUsage{completionTokens=33, promptTokens=10, totalTokens=43, completionTokensDetails=, promptTokensDetails=, additionalProperties={}}, additionalProperties={}}

Node.js

Request example

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // If environment variable not set, replace with apiKey: "sk-xxx".
        // Avoid hard-coding API keys in production to reduce leak risk.
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://batch.dashscope.aliyuncs.com/compatible-mode/v1", // Batch Chat API endpoint
        // Timeout in milliseconds: 1800s = 1,800,000ms. Max: 3,600,000ms.
        timeout: 1800 * 1000,
    }
);

async function main() {
    const completion = await openai.chat.completions.create({
        model: "qwen-plus",  // Replace with your model name as needed
        messages: [
            { role: "system", content: "You are a helpful assistant." },
            { role: "user", content: "Who are you?" }
        ],
    });
    console.log(JSON.stringify(completion))
}

main();

Response example

{
    "created": 1763618557,
    "usage": {
        "completion_tokens": 80,
        "prompt_tokens": 22,
        "total_tokens": 102
    },
    "model": "qwen-plus",
    "id": "chatcmpl-af23c086-8662-91eb-b236-892032ddee92",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions and create text, such as stories, official documents, emails, and scripts. I can also perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to let me know!"
            }
        }
    ],
    "object": "chat.completion"
}

Go

Request example

package main

import (
    "context"
    "os"
    "time"

    "github.com/openai/openai-go"
    "github.com/openai/openai-go/option"
)

func main() {
    client := openai.NewClient(
    option.WithAPIKey(os.Getenv("DASHSCOPE_API_KEY")),
    // Batch Chat API endpoint
    option.WithBaseURL("https://batch.dashscope.aliyuncs.com/compatible-mode/v1"),
    )
    // Timeout: 1800s (30 min). Max: 3600s.
    ctx, cancel := context.WithTimeout(context.Background(), 3600*time.Second)
    defer cancel()
    
    chatCompletion, err := client.Chat.Completions.New(
    ctx, openai.ChatCompletionNewParams{
    Messages: []openai.ChatCompletionMessageParamUnion{
    openai.UserMessage("Who are you?"),
    },
    Model: "qwen-plus",
    },
    )

    if err != nil {
    panic(err.Error())
    }

    println(chatCompletion.Choices[0].Message.Content)
}

Response example

I am Qwen, a large-scale language model developed by Alibaba Cloud. I can generate various types of text, such as articles, stories, and poems, and can adapt and expand them for different scenarios and needs. In addition, I can answer various questions and provide help and solutions. I am happy to be of service!

C# (HTTP)

Request example

using System;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;

class Program
{
    private static readonly HttpClient httpClient = new HttpClient
    {
        // Timeout: 1800s (30 min). Max: 3600s.
        Timeout = TimeSpan.FromSeconds(1800)
    };

    static async Task Main(string[] args)
    {
        // If environment variable not set, replace with: string? apiKey = "sk-xxx";
        string? apiKey = Environment.GetEnvironmentVariable("DASHSCOPE_API_KEY");

        if (string.IsNullOrEmpty(apiKey))
        {
            Console.WriteLine("API Key is not set. Make sure the 'DASHSCOPE_API_KEY' environment variable is set.");
            return;
        }

        // Set the request URL and content
        string url = "https://batch.dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"; // Batch Chat API endpoint
        // Replace with your model name as needed
        string jsonContent = @"{
            ""model"": ""qwen-plus"",
            ""messages"": [
                {
                    ""role"": ""system"",
                    ""content"": ""You are a helpful assistant.""
                },
                {
                    ""role"": ""user"", 
                    ""content"": ""Who are you?""
                }
            ]
        }";

        // Send the request and get the response
        string result = await SendPostRequestAsync(url, jsonContent, apiKey);

        // Print the result
        Console.WriteLine(result);
    }

    private static async Task<string> SendPostRequestAsync(string url, string jsonContent, string apiKey)
    {
        using (var content = new StringContent(jsonContent, Encoding.UTF8, "application/json"))
        {
            // Set request headers
            httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
            httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

            // Send the request and get the response
            HttpResponseMessage response = await httpClient.PostAsync(url, content);

            // Process the response
            if (response.IsSuccessStatusCode)
            {
                return await response.Content.ReadAsStringAsync();
            }
            else
            {
                return $"Request failed: {response.StatusCode}";
            }
        }
    }
}

Response example

{
    "created": 1763620689,
    "usage": {
        "completion_tokens": 60,
        "prompt_tokens": 22,
        "total_tokens": 82
    },
    "model": "qwen-plus",
    "id": "chatcmpl-db85828d-af47-97a3-a2f4-120b8f7d72d3",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "I am Qwen, a large-scale language model from Alibaba Group. I can answer questions, create text such as stories, official documents, emails, and scripts, perform logical reasoning, write code, and more. I can also express opinions and play games. If you have any questions or need help, feel free to let me know!"
            }
        }
    ],
    "object": "chat.completion"
}

PHP (HTTP)

Request example

<?php
// Set the URL for the Batch Chat request
$url = 'https://batch.dashscope.aliyuncs.com/compatible-mode/v1/chat/completions';
// If environment variable not set, replace with: $apiKey = "sk-xxx";
$apiKey = getenv('DASHSCOPE_API_KEY');
// Set request headers
$headers = [
    'Authorization: Bearer '.$apiKey,
    'Content-Type: application/json'
];
// Set the request body
$data = [
    // Replace with your model name as needed
    "model" => "qwen-plus",
    "messages" => [
        [
            "role" => "system",
            "content" => "You are a helpful assistant."
        ],
        [
            "role" => "user",
            "content" => "Who are you?"
        ]
    ]
];
// Initialize cURL session
$ch = curl_init();
// Set cURL options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
// Timeout: 1800s (30 min). Max: 3600s.
curl_setopt($ch, CURLOPT_TIMEOUT, 1800);
// Execute cURL session
$response = curl_exec($ch);
// Check for errors
if (curl_errno($ch)) {
    echo 'Curl error: ' . curl_error($ch);
}
// Close cURL resource
curl_close($ch);
// Print the response
echo $response;
?>

Response example

{
    "created": 1763621824,
    "usage": {
        "completion_tokens": 81,
        "prompt_tokens": 22,
        "total_tokens": 103
    },
    "model": "qwen-plus",
    "id": "chatcmpl-b25aeb86-5cfe-93ea-ab03-aa3de1381c23",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "I am Qwen, a large-scale language model developed by Alibaba Cloud. I can answer questions and create text, such as stories, official documents, emails, and scripts. I can also perform logical reasoning, write code, and even express opinions and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to let me know!"
            }
        }
    ],
    "object": "chat.completion"
}

curl

Request example

Set max-time to 1800 seconds. Maximum: 3600 seconds.

curl -X POST https://batch.dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
--max-time 1800 \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Who are you?"
        }
    ]
}'

Response example

{
    "created": 1763622152,
    "usage": {
        "completion_tokens": 79,
        "prompt_tokens": 22,
        "total_tokens": 101
    },
    "model": "qwen-plus",
    "id": "chatcmpl-daa344d2-60df-9b79-81a4-28c9a10a0a0e",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "I am Qwen, a large-scale language model from Alibaba Group. I can answer questions and create text, such as stories, official documents, emails, and scripts. I can also perform logical reasoning, write code, express opinions, and play games. I support multiple languages, including but not limited to Chinese, English, German, French, and Spanish. If you have any questions or need help, feel free to let me know!"
            }
        }
    ],
    "object": "chat.completion"
}

Limitations

Wait time: Maximum 3600 seconds (1 hour) for synchronous waiting. Set custom timeout: 60–3600 seconds.
Concurrency limits: Maximum 10,000 pending requests per model per account. Exceeding requests rejected with error code. New requests accepted only after pending requests complete.
Call rate: Maximum 1000 QPS per account, or 10,000 calls per 10 seconds.
Theoretical maximum only. Actual availability depends on system load. Implement retry logic.

Billing

Unit price: Billing based on input/output tokens in successful requests. List price matches real-time call price. Limited-time 50% discount available on official website. See Model list.
Billing scope: Only successful requests billed. Failed requests (system errors or timeouts) not billed.

Note

Batch inference is a separate billing item. It is not eligible for discounts, such as subscription (Savings Plan) or free quotas for new users. It also does not support features such as context cache.
Some models, such as qwen3.5-plus and qwen3.5-flash, have thinking mode enabled by default. This mode generates additional thinking tokens, which are billed at the output token price and increase costs. To control costs, set the `enable_thinking` parameter based on task complexity. For more information, see Deep thinking.

Error codes

If the model call fails and returns an error message, see Error messages for resolution.

FAQ

Is there a difference in request time between Batch Chat and the real-time API?
Yes. Requests are queued for scheduling, so end-to-end time is typically longer than real-time API. Maximum wait: 1 hour. Connection disconnects with error if timeout exceeded.
How do I choose between Batch Chat and Batch File?
Choose Batch Chat for many independent dialogue requests with high concurrency via synchronous calls. Choose Batch File for processing a single large file with many requests via asynchronous retrieval.
Does Batch Chat guarantee that all requests will be completed?
No. Completion depends on shared resource allocation. Requests may queue if resources busy. Connection times out if not executed within maximum wait time. Timed-out requests are not billed; retry later.

References

Complete parameter list for real-time model calls: OpenAI Chat.
Batch processing via file submission with asynchronous results: OpenAI compatible - Batch (file input).