When calling models in high-concurrency scenarios, you may encounter request timeouts or high resource consumption. You can optimize connection reuse by configuring the connection pool.
Connection pool
In high-concurrency scenarios, creating and destroying network connections frequently causes significant performance overhead. Therefore, DashScope Java SDK enables connection pooling by default to reduce resource consumption and improve request processing efficiency through connection reuse.
You can optimize connection reuse by adjusting the number of connections and timeout settings. Here are the related parameters:
Parameter | Description | Default value | Unit | Note |
connectTimeout | The timeout period for establishing a connection. | 120 | seconds | In low-latency scenarios, you typically need to set shorter connection timeout periods to reduce waiting time and improve response speed. |
readTimeout | The timeout period for reading data. | 300 | seconds | |
writeTimeout | The timeout period for writing data. | 60 | seconds | |
connectionIdleTimeout | The timeout period for idle connections in the connection pool. | 300 | seconds | In high-concurrency scenarios, extending the idle connection timeout period helps avoid frequent connection creation and reduce resource consumption. |
connectionPoolSize | The maximum number of connections in the connection pool. | 32 | connections | In high-concurrency scenarios:
Adjust the value based on your business requirements. |
maximumAsyncRequests | The maximum number of concurrent requests. This is a global limit for concurrent requests that includes all hosts. This value must be less than or equal to the maximum number of connections. Otherwise, requests may be blocked. | 32 | requests | |
maximumAsyncRequestsPerHost | The maximum number of concurrent requests per host. This value must be less than or equal to the maximum number of concurrent requests. | 32 | requests |
Sample requests
Install the SDK
The following sample code shows how to configure connection pool parameters, such as timeout periods and maximum number of connections, and call the model. You can adjust these parameters to optimize concurrent performance and resource utilization based on your requirements.
// We recommend DashScope SDK version >= 2.12.0
import java.time.Duration;
import java.util.Arrays;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.protocol.ConnectionConfigurations;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
Message systemMsg = Message.builder()
.role(Role.SYSTEM.getValue())
.content("You are a helpful assistant.")
.build();
Message userMsg = Message.builder()
.role(Role.USER.getValue())
.content("Who are you?")
.build();
GenerationParam param = GenerationParam.builder()
// If environment variables are not configured, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// This example uses qwen-plus. You can change the model name as needed. For the model list, visit: https://www.alibabacloud.com/help/en/model-studio/getting-started/models
.model("qwen-plus")
.messages(Arrays.asList(systemMsg, userMsg))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.build();
System.out.println(userMsg.getContent());
return gen.call(param);
}
public static void main(String[] args) {
// Connection pool configuration
Constants.connectionConfigurations = ConnectionConfigurations.builder()
.connectTimeout(Duration.ofSeconds(10)) // Connection establishment timeout, default: 120s
.readTimeout(Duration.ofSeconds(300)) // Data read timeout, default: 300s
.writeTimeout(Duration.ofSeconds(60)) // Data write timeout, default: 60s
.connectionIdleTimeout(Duration.ofSeconds(300)) // Idle connection timeout in the connection pool, default: 300s
.connectionPoolSize(256) // Maximum number of connections in the connection pool, default: 32
.maximumAsyncRequests(256) // Maximum number of concurrent requests, default: 32
.maximumAsyncRequestsPerHost(256) // Maximum number of concurrent requests per host, default: 32
.build();
try {
GenerationResult result = callWithMessage();
System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent());
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
// Use a logging framework to record exception information
System.err.println("An error occurred while calling the generation service: " + e.getMessage());
}
System.exit(0);
}
}
Error code
If the call failed and an error message is returned, see Error messages.