All Products
Search
Document Center

Alibaba Cloud Model Studio:Configure connection reuse for DashScope SDK

Last Updated:Feb 09, 2026

Reusing connections can optimize network connectivity efficiency in high concurrency scenarios, to reduce timeouts and resource consumption.

Connection reuse

The DashScope SDK supports reusing existing connections to reduce resource consumption and improve processing efficiency.

  • Java SDK: Enabled by default. Includes a built-in connection pool mechanism. You can configure parameters such as maximum connections and timeout.

  • Python SDK: Supports connection reuse by passing a custom Session. Includes synchronous and asynchronous invocation methods.

Java SDK

The DashScope Java SDK includes a built-in connection pool mechanism that is enabled by default. Adjust the connection pool’s maximum connections and timeout settings as needed to optimize connection reuse.

Parameters

Parameter

Description

Default value

Unit

Notes

connectTimeout

Timeout for establishing a connection.

120

seconds

In low-latency scenarios, set a shorter connection timeout to reduce waiting time and improve response speed.

readTimeout

Timeout for reading data.

300

seconds

writeTimeout

Timeout for writing data.

60

seconds

connectionIdleTimeout

Timeout for idle connections in the connection pool.

300

seconds

In high concurrency scenarios, extending the idle connection timeout helps avoid frequent connection creation, reducing resource consumption.

connectionPoolSize

Maximum connections in the connection pool.

32

Item

In high concurrency scenarios:

  • If the number of connections is too low, requests may block or time out, or connections may be created frequently, increasing resource consumption.

  • If the number of connections is too high, the server-side load may become excessive.

Adjust the configuration as needed.

maximumAsyncRequests

Maximum concurrent requests. This is a global limit for concurrent requests (including all hosts). It must be less than or equal to the maximum connections; otherwise, requests may block.

32

requests

maximumAsyncRequestsPerHost

Maximum concurrent requests per host. It must be less than or equal to the maximum concurrent requests.

32

item

Code examples

Before running the code, export the API key as an environment variable and install the latest SDK.

The following code example shows how to configure connection pool parameters such as timeout and maximum connections, and call model services. Adjust these parameters to optimize concurrent performance and resource utilization.

// Recommended DashScope SDK version >= 2.12.0
import java.time.Duration;
import java.util.Arrays;

import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.protocol.ConnectionConfigurations;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {
        // This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
        Message systemMsg = Message.builder()
                .role(Role.SYSTEM.getValue())
                .content("You are a helpful assistant.")
                .build();
        Message userMsg = Message.builder()
                .role(Role.USER.getValue())
                .content("Who are you?")
                .build();
        GenerationParam param = GenerationParam.builder()
                // API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // This example uses qwen-plus. Change the model name as needed. Model list: https://www.alibabacloud.com/help/zh/model-studio/getting-started/models
                .model("qwen-plus")
                .messages(Arrays.asList(systemMsg, userMsg))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .build();

        System.out.println(userMsg.getContent());
        return gen.call(param);
    }
    public static void main(String[] args) {
        // Connection pool configuration
        Constants.connectionConfigurations = ConnectionConfigurations.builder()
                .connectTimeout(Duration.ofSeconds(10))  // Timeout for establishing a connection, default 120s
                .readTimeout(Duration.ofSeconds(300)) // Timeout for reading data, default 300s
                .writeTimeout(Duration.ofSeconds(60)) // Timeout for writing data, default 60s
                .connectionIdleTimeout(Duration.ofSeconds(300)) // Timeout for idle connections in the connection pool, default 300s
                .connectionPoolSize(256) // Maximum connections in the connection pool, default 32
                .maximumAsyncRequests(256)  // Maximum concurrent requests, default 32
                .maximumAsyncRequestsPerHost(256) // Maximum concurrent requests per host, default 32
                .build();

        try {
            GenerationResult result = callWithMessage();
            System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent());
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            // Use a logging framework to record exception information
            System.err.println("An error occurred while calling the generation service: " + e.getMessage());
        }
        System.exit(0);
    }
}

Python SDK

The DashScope Python SDK supports connection reuse by passing a custom Session. It provides two invocation methods: HTTP asynchronous (coroutine-based) and HTTP synchronous.

HTTP asynchronous

In asynchronous scenarios, use aiohttp.ClientSession with aiohttp.TCPConnector to enable connection reuse. TCPConnector supports configuring parameters such as connection limits:

Parameter

Description

Default value

Notes

limit

Total connection limit

100

Controls the maximum connections. In high concurrency scenarios, increasing this value can improve concurrent capability.

limit_per_host

Connection limit per host

0 (unlimited)

Limits the maximum connections to a single host, preventing excessive pressure on a single server-side.

ssl

SSL context configuration

None

SSL Certificate validation configuration for HTTPS connections.

Code examples

Before running the code, export the API key as an environment variable and install the latest SDK.

The following code example shows how to configure connection reuse and call model services in an asynchronous scenario:

import asyncio
import aiohttp
import ssl
import certifi
from dashscope import AioGeneration
import dashscope
import os

async def main():
    # This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
    dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

    # API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

    # Configure connection parameters
    connector = aiohttp.TCPConnector(
        limit=100,           # Total connection limit
        limit_per_host=30,   # Connection limit per host
        ssl=ssl.create_default_context(cafile=certifi.where()),
    )
    
    # Create a custom Session and pass it to the call method
    async with aiohttp.ClientSession(connector=connector) as session:
        response = await AioGeneration.call(
            model='qwen-plus',
            prompt='Hello, please introduce yourself',
            session=session,  # Pass the custom Session
        )
        print(response)

asyncio.run(main())

HTTP synchronous

In synchronous scenarios, use requests.Session to enable connection reuse. Multiple requests within the same Session reuse the underlying TCP connection, avoiding the overhead of repeatedly establishing connections.

Code examples

Before running the code, export the API key as an environment variable and install the latest SDK.

The following code example shows how to configure connection reuse and call model services in a synchronous scenario:

import requests
from dashscope import Generation
import dashscope
import os

# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# Use a with statement to ensure the Session closes correctly
with requests.Session() as session:
    response = Generation.call(
        model='qwen-plus',
        prompt='Hello',
        session=session  # Pass the custom Session
    )
    print(response)

To reuse the same Session across multiple calls, use the following method:

import requests
from dashscope import Generation
import dashscope
import os

# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# Create a Session object
session = requests.Session()

try:
    # Reuse the same Session for multiple calls
    response1 = Generation.call(
        model='qwen-plus',
        prompt='Hello',
        session=session
    )
    print(response1)
    
    response2 = Generation.call(
        model='qwen-plus',
        prompt='Introduce yourself',
        session=session
    )
    print(response2)
finally:
    # Ensure the Session closes correctly
    session.close()

Best practices

  • Java SDK: Set connectionPoolSize and maximumAsyncRequests based on your application’s concurrent workload. Avoid setting connection counts too high or too low.

  • Python SDK: Use the with statement to automatically manage the Session lifecycle and ensure resources are released correctly.

  • Choose the right call method: Use asynchronous invocation for asynchronous applications, such as those built with asyncio or FastAPI. Use synchronous invocation for traditional synchronous applications.

Error codes

If a model invocation fails and returns an error message, see Error messages to resolve the issue.