Configure connection reuse for DashScope SDK - Alibaba Cloud Model Studio

Without connection reuse, each API call opens a new TCP connection and performs a TLS handshake, adding latency. In high-concurrency scenarios, this overhead causes timeouts and resource waste. Connection reuse eliminates repeated setup, reducing latency and resource consumption.

The DashScope SDK supports connection reuse in both Java and Python:

Java SDK: A built-in connection pool is enabled by default. Configure parameters like maximum connections and timeout durations.
Python SDK: Enable connection reuse by passing a custom Session. It supports both synchronous and asynchronous calls.

Before you begin

Before running the code examples:

Java SDK

The Java SDK includes a built-in connection pool enabled by default. Adjust maximum connections and timeout settings to optimize for your workload.

Parameters

Parameter	Description	Default value	Unit
connectTimeout	Connection establishment timeout.	120	seconds
readTimeout	Data read timeout.	300	seconds
writeTimeout	Data write timeout.	60	seconds
connectionIdleTimeout	Idle connection timeout in the pool.	300	seconds
connectionPoolSize	Maximum connections in the pool.	32	connections
maximumAsyncRequests	Maximum concurrent requests across all hosts (global limit).	32	requests
maximumAsyncRequestsPerHost	Maximum concurrent requests per host.	32	requests

Parameter constraints:

maximumAsyncRequests must be ≤ connectionPoolSize; otherwise, requests may block.
maximumAsyncRequestsPerHost must be ≤ maximumAsyncRequests.

Tuning guidelines:

connectTimeout: In low-latency scenarios, set a shorter timeout to reduce wait time.
connectionIdleTimeout: In high-concurrency scenarios, extend idle timeout to avoid frequent connection creation and reduce resource consumption.
connectionPoolSize: In high-concurrency scenarios, too few connections cause blocking, timeouts, and frequent reconnections (higher resource usage). Too many connections overload the server. Balance the connection count based on your workload.

Code example

The following example configures connection pool parameters (timeout, maximum connections) and calls a model service. Adjust parameters to optimize concurrency and resource usage.

// Recommended DashScope SDK version >= 2.12.0
import java.time.Duration;
import java.util.Arrays;

import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.protocol.ConnectionConfigurations;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {
        // This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
        Message systemMsg = Message.builder()
                .role(Role.SYSTEM.getValue())
                .content("You are a helpful assistant.")
                .build();
        Message userMsg = Message.builder()
                .role(Role.USER.getValue())
                .content("Who are you?")
                .build();
        GenerationParam param = GenerationParam.builder()
                // API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // This example uses qwen-plus. Change the model name as needed. Model list: https://www.alibabacloud.com/help/zh/model-studio/getting-started/models
                .model("qwen-plus")
                .messages(Arrays.asList(systemMsg, userMsg))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .build();

        System.out.println(userMsg.getContent());
        return gen.call(param);
    }
    public static void main(String[] args) {
        // Connection pool configuration
        Constants.connectionConfigurations = ConnectionConfigurations.builder()
                .connectTimeout(Duration.ofSeconds(10))  // Timeout for establishing a connection, default 120s
                .readTimeout(Duration.ofSeconds(300)) // Timeout for reading data, default 300s
                .writeTimeout(Duration.ofSeconds(60)) // Timeout for writing data, default 60s
                .connectionIdleTimeout(Duration.ofSeconds(300)) // Timeout for idle connections in the connection pool, default 300s
                .connectionPoolSize(256) // Maximum connections in the connection pool, default 32
                .maximumAsyncRequests(256)  // Maximum concurrent requests, default 32
                .maximumAsyncRequestsPerHost(256) // Maximum concurrent requests per host, default 32
                .build();

        try {
            GenerationResult result = callWithMessage();
            System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent());
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            // Use a logging framework to record exception information
            System.err.println("An error occurred while calling the generation service: " + e.getMessage());
        }
        System.exit(0);
    }
}

Python SDK

The Python SDK supports connection reuse by passing a custom Session. It supports two calling methods: asynchronous HTTP (coroutine-based) and synchronous HTTP.

Asynchronous HTTP

For asynchronous scenarios, use aiohttp.ClientSession with aiohttp.TCPConnector to enable connection reuse. TCPConnector supports configuring parameters like connection limits:

Parameter	Description	Default value	Notes
limit	Total connection limit (all hosts).	100	In high-concurrency scenarios, increase this value to improve concurrency.
limit_per_host	Connection limit per host.	0 (unlimited)	Prevents excessive load on a single host.
ssl	SSL context configuration.	None	SSL certificate validation for HTTPS.

Code example

The following example configures connection reuse and calls a model service asynchronously:

import asyncio
import aiohttp
import ssl
import certifi
from dashscope import AioGeneration
import dashscope
import os

async def main():
    # This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
    dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

    # API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

    # Configure connection parameters
    connector = aiohttp.TCPConnector(
        limit=100,           # Total connection limit
        limit_per_host=30,   # Connection limit per host
        ssl=ssl.create_default_context(cafile=certifi.where()),
    )

    # Create a custom Session and pass it to the call method
    async with aiohttp.ClientSession(connector=connector) as session:
        response = await AioGeneration.call(
            model='qwen-plus',
            prompt='Hello, please introduce yourself',
            session=session,  # Pass the custom Session
        )
        print(response)

asyncio.run(main())

Synchronous HTTP

For synchronous scenarios, use requests.Session to enable connection reuse. Multiple requests within the same Session reuse the underlying TCP connection, avoiding the overhead of repeatedly establishing connections.

Code example: single call

The following example configures connection reuse and calls a model service synchronously:

import requests
from dashscope import Generation
import dashscope
import os

# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# Use with statement to ensure Session closes correctly
with requests.Session() as session:
    response = Generation.call(
        model='qwen-plus',
        prompt='Hello',
        session=session  # Pass the custom Session
    )
    print(response)

Code example: multiple calls with a shared Session

To reuse a Session across multiple calls:

import requests
from dashscope import Generation
import dashscope
import os

# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# Create a Session object
session = requests.Session()

try:
    # Reuse the same Session for multiple calls
    response1 = Generation.call(
        model='qwen-plus',
        prompt='Hello',
        session=session
    )
    print(response1)

    response2 = Generation.call(
        model='qwen-plus',
        prompt='Introduce yourself',
        session=session
    )
    print(response2)
finally:
    # Ensure Session closes correctly
    session.close()

Best practices

Java SDK: Set connectionPoolSize and maximumAsyncRequests based on your workload. Balance the connection count: too few cause blocking, too many overload the server.
Python SDK: Use the with statement to manage Session lifecycle and ensure correct resource cleanup.
Choose the right method: Use async calls for async applications (asyncio, FastAPI). Use sync calls for traditional sync applications.

Error codes

If a model call fails, see Error messages to resolve the issue.