Without connection reuse, each API call opens a new TCP connection and performs a TLS handshake, adding latency. In high-concurrency scenarios, this overhead causes timeouts and resource waste. Connection reuse eliminates repeated setup, reducing latency and resource consumption.
The DashScope SDK supports connection reuse in both Java and Python:
-
Java SDK: A built-in connection pool is enabled by default. Configure parameters like maximum connections and timeout durations.
-
Python SDK: Enable connection reuse by passing a custom Session. It supports both synchronous and asynchronous calls.
Before you begin
Before running the code examples:
Java SDK
The Java SDK includes a built-in connection pool enabled by default. Adjust maximum connections and timeout settings to optimize for your workload.
Parameters
| Parameter | Description | Default value | Unit |
|---|---|---|---|
| connectTimeout | Connection establishment timeout. | 120 | seconds |
| readTimeout | Data read timeout. | 300 | seconds |
| writeTimeout | Data write timeout. | 60 | seconds |
| connectionIdleTimeout | Idle connection timeout in the pool. | 300 | seconds |
| connectionPoolSize | Maximum connections in the pool. | 32 | connections |
| maximumAsyncRequests | Maximum concurrent requests across all hosts (global limit). | 32 | requests |
| maximumAsyncRequestsPerHost | Maximum concurrent requests per host. | 32 | requests |
Parameter constraints:
-
maximumAsyncRequestsmust be ≤connectionPoolSize; otherwise, requests may block. -
maximumAsyncRequestsPerHostmust be ≤maximumAsyncRequests.
Tuning guidelines:
-
connectTimeout: In low-latency scenarios, set a shorter timeout to reduce wait time.
-
connectionIdleTimeout: In high-concurrency scenarios, extend idle timeout to avoid frequent connection creation and reduce resource consumption.
-
connectionPoolSize: In high-concurrency scenarios, too few connections cause blocking, timeouts, and frequent reconnections (higher resource usage). Too many connections overload the server. Balance the connection count based on your workload.
Code example
The following example configures connection pool parameters (timeout, maximum connections) and calls a model service. Adjust parameters to optimize concurrency and resource usage.
// Recommended DashScope SDK version >= 2.12.0
import java.time.Duration;
import java.util.Arrays;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.protocol.ConnectionConfigurations;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
public class Main {
public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {
// This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
Message systemMsg = Message.builder()
.role(Role.SYSTEM.getValue())
.content("You are a helpful assistant.")
.build();
Message userMsg = Message.builder()
.role(Role.USER.getValue())
.content("Who are you?")
.build();
GenerationParam param = GenerationParam.builder()
// API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// This example uses qwen-plus. Change the model name as needed. Model list: https://www.alibabacloud.com/help/zh/model-studio/getting-started/models
.model("qwen-plus")
.messages(Arrays.asList(systemMsg, userMsg))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.build();
System.out.println(userMsg.getContent());
return gen.call(param);
}
public static void main(String[] args) {
// Connection pool configuration
Constants.connectionConfigurations = ConnectionConfigurations.builder()
.connectTimeout(Duration.ofSeconds(10)) // Timeout for establishing a connection, default 120s
.readTimeout(Duration.ofSeconds(300)) // Timeout for reading data, default 300s
.writeTimeout(Duration.ofSeconds(60)) // Timeout for writing data, default 60s
.connectionIdleTimeout(Duration.ofSeconds(300)) // Timeout for idle connections in the connection pool, default 300s
.connectionPoolSize(256) // Maximum connections in the connection pool, default 32
.maximumAsyncRequests(256) // Maximum concurrent requests, default 32
.maximumAsyncRequestsPerHost(256) // Maximum concurrent requests per host, default 32
.build();
try {
GenerationResult result = callWithMessage();
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent());
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
// Use a logging framework to record exception information
System.err.println("An error occurred while calling the generation service: " + e.getMessage());
}
System.exit(0);
}
}Python SDK
The Python SDK supports connection reuse by passing a custom Session. It supports two calling methods: asynchronous HTTP (coroutine-based) and synchronous HTTP.
Asynchronous HTTP
For asynchronous scenarios, use aiohttp.ClientSession with aiohttp.TCPConnector to enable connection reuse. TCPConnector supports configuring parameters like connection limits:
| Parameter | Description | Default value | Notes |
|---|---|---|---|
| limit | Total connection limit (all hosts). | 100 | In high-concurrency scenarios, increase this value to improve concurrency. |
| limit_per_host | Connection limit per host. | 0 (unlimited) | Prevents excessive load on a single host. |
| ssl | SSL context configuration. | None | SSL certificate validation for HTTPS. |
Code example
The following example configures connection reuse and calls a model service asynchronously:
import asyncio
import aiohttp
import ssl
import certifi
from dashscope import AioGeneration
import dashscope
import os
async def main():
# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Configure connection parameters
connector = aiohttp.TCPConnector(
limit=100, # Total connection limit
limit_per_host=30, # Connection limit per host
ssl=ssl.create_default_context(cafile=certifi.where()),
)
# Create a custom Session and pass it to the call method
async with aiohttp.ClientSession(connector=connector) as session:
response = await AioGeneration.call(
model='qwen-plus',
prompt='Hello, please introduce yourself',
session=session, # Pass the custom Session
)
print(response)
asyncio.run(main())Synchronous HTTP
For synchronous scenarios, use requests.Session to enable connection reuse. Multiple requests within the same Session reuse the underlying TCP connection, avoiding the overhead of repeatedly establishing connections.
Code example: single call
The following example configures connection reuse and calls a model service synchronously:
import requests
from dashscope import Generation
import dashscope
import os
# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Use with statement to ensure Session closes correctly
with requests.Session() as session:
response = Generation.call(
model='qwen-plus',
prompt='Hello',
session=session # Pass the custom Session
)
print(response)Code example: multiple calls with a shared Session
To reuse a Session across multiple calls:
import requests
from dashscope import Generation
import dashscope
import os
# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Create a Session object
session = requests.Session()
try:
# Reuse the same Session for multiple calls
response1 = Generation.call(
model='qwen-plus',
prompt='Hello',
session=session
)
print(response1)
response2 = Generation.call(
model='qwen-plus',
prompt='Introduce yourself',
session=session
)
print(response2)
finally:
# Ensure Session closes correctly
session.close()Best practices
-
Java SDK: Set
connectionPoolSizeandmaximumAsyncRequestsbased on your workload. Balance the connection count: too few cause blocking, too many overload the server. -
Python SDK: Use the
withstatement to manage Session lifecycle and ensure correct resource cleanup. -
Choose the right method: Use async calls for async applications (asyncio, FastAPI). Use sync calls for traditional sync applications.
Error codes
If a model call fails, see Error messages to resolve the issue.