Reusing connections can optimize network connectivity efficiency in high concurrency scenarios, to reduce timeouts and resource consumption.
Connection reuse
The DashScope SDK supports reusing existing connections to reduce resource consumption and improve processing efficiency.
Java SDK: Enabled by default. Includes a built-in connection pool mechanism. You can configure parameters such as maximum connections and timeout.
Python SDK: Supports connection reuse by passing a custom Session. Includes synchronous and asynchronous invocation methods.
Java SDK
The DashScope Java SDK includes a built-in connection pool mechanism that is enabled by default. Adjust the connection pool’s maximum connections and timeout settings as needed to optimize connection reuse.
Parameters
Parameter | Description | Default value | Unit | Notes |
connectTimeout | Timeout for establishing a connection. | 120 | seconds | In low-latency scenarios, set a shorter connection timeout to reduce waiting time and improve response speed. |
readTimeout | Timeout for reading data. | 300 | seconds | |
writeTimeout | Timeout for writing data. | 60 | seconds | |
connectionIdleTimeout | Timeout for idle connections in the connection pool. | 300 | seconds | In high concurrency scenarios, extending the idle connection timeout helps avoid frequent connection creation, reducing resource consumption. |
connectionPoolSize | Maximum connections in the connection pool. | 32 | Item | In high concurrency scenarios:
Adjust the configuration as needed. |
maximumAsyncRequests | Maximum concurrent requests. This is a global limit for concurrent requests (including all hosts). It must be less than or equal to the maximum connections; otherwise, requests may block. | 32 | requests | |
maximumAsyncRequestsPerHost | Maximum concurrent requests per host. It must be less than or equal to the maximum concurrent requests. | 32 | item |
Code examples
Before running the code, export the API key as an environment variable and install the latest SDK.
The following code example shows how to configure connection pool parameters such as timeout and maximum connections, and call model services. Adjust these parameters to optimize concurrent performance and resource utilization.
// Recommended DashScope SDK version >= 2.12.0
import java.time.Duration;
import java.util.Arrays;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.protocol.ConnectionConfigurations;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
public class Main {
public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {
// This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
Message systemMsg = Message.builder()
.role(Role.SYSTEM.getValue())
.content("You are a helpful assistant.")
.build();
Message userMsg = Message.builder()
.role(Role.USER.getValue())
.content("Who are you?")
.build();
GenerationParam param = GenerationParam.builder()
// API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// This example uses qwen-plus. Change the model name as needed. Model list: https://www.alibabacloud.com/help/zh/model-studio/getting-started/models
.model("qwen-plus")
.messages(Arrays.asList(systemMsg, userMsg))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.build();
System.out.println(userMsg.getContent());
return gen.call(param);
}
public static void main(String[] args) {
// Connection pool configuration
Constants.connectionConfigurations = ConnectionConfigurations.builder()
.connectTimeout(Duration.ofSeconds(10)) // Timeout for establishing a connection, default 120s
.readTimeout(Duration.ofSeconds(300)) // Timeout for reading data, default 300s
.writeTimeout(Duration.ofSeconds(60)) // Timeout for writing data, default 60s
.connectionIdleTimeout(Duration.ofSeconds(300)) // Timeout for idle connections in the connection pool, default 300s
.connectionPoolSize(256) // Maximum connections in the connection pool, default 32
.maximumAsyncRequests(256) // Maximum concurrent requests, default 32
.maximumAsyncRequestsPerHost(256) // Maximum concurrent requests per host, default 32
.build();
try {
GenerationResult result = callWithMessage();
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent());
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
// Use a logging framework to record exception information
System.err.println("An error occurred while calling the generation service: " + e.getMessage());
}
System.exit(0);
}
}Python SDK
The DashScope Python SDK supports connection reuse by passing a custom Session. It provides two invocation methods: HTTP asynchronous (coroutine-based) and HTTP synchronous.
HTTP asynchronous
In asynchronous scenarios, use aiohttp.ClientSession with aiohttp.TCPConnector to enable connection reuse. TCPConnector supports configuring parameters such as connection limits:
Parameter | Description | Default value | Notes |
limit | Total connection limit | 100 | Controls the maximum connections. In high concurrency scenarios, increasing this value can improve concurrent capability. |
limit_per_host | Connection limit per host | 0 (unlimited) | Limits the maximum connections to a single host, preventing excessive pressure on a single server-side. |
ssl | SSL context configuration | None | SSL Certificate validation configuration for HTTPS connections. |
Code examples
Before running the code, export the API key as an environment variable and install the latest SDK.
The following code example shows how to configure connection reuse and call model services in an asynchronous scenario:
import asyncio
import aiohttp
import ssl
import certifi
from dashscope import AioGeneration
import dashscope
import os
async def main():
# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Configure connection parameters
connector = aiohttp.TCPConnector(
limit=100, # Total connection limit
limit_per_host=30, # Connection limit per host
ssl=ssl.create_default_context(cafile=certifi.where()),
)
# Create a custom Session and pass it to the call method
async with aiohttp.ClientSession(connector=connector) as session:
response = await AioGeneration.call(
model='qwen-plus',
prompt='Hello, please introduce yourself',
session=session, # Pass the custom Session
)
print(response)
asyncio.run(main())HTTP synchronous
In synchronous scenarios, use requests.Session to enable connection reuse. Multiple requests within the same Session reuse the underlying TCP connection, avoiding the overhead of repeatedly establishing connections.
Code examples
Before running the code, export the API key as an environment variable and install the latest SDK.
The following code example shows how to configure connection reuse and call model services in a synchronous scenario:
import requests
from dashscope import Generation
import dashscope
import os
# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Use a with statement to ensure the Session closes correctly
with requests.Session() as session:
response = Generation.call(
model='qwen-plus',
prompt='Hello',
session=session # Pass the custom Session
)
print(response)To reuse the same Session across multiple calls, use the following method:
import requests
from dashscope import Generation
import dashscope
import os
# This is the URL for the Singapore region. If you use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# API keys for Singapore and Beijing regions differ. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Create a Session object
session = requests.Session()
try:
# Reuse the same Session for multiple calls
response1 = Generation.call(
model='qwen-plus',
prompt='Hello',
session=session
)
print(response1)
response2 = Generation.call(
model='qwen-plus',
prompt='Introduce yourself',
session=session
)
print(response2)
finally:
# Ensure the Session closes correctly
session.close()Best practices
Java SDK: Set
connectionPoolSizeandmaximumAsyncRequestsbased on your application’s concurrent workload. Avoid setting connection counts too high or too low.Python SDK: Use the
withstatement to automatically manage the Session lifecycle and ensure resources are released correctly.Choose the right call method: Use asynchronous invocation for asynchronous applications, such as those built with asyncio or FastAPI. Use synchronous invocation for traditional synchronous applications.
Error codes
If a model invocation fails and returns an error message, see Error messages to resolve the issue.