The Paraformer real-time speech recognition service uses the WebSocket protocol for real-time streaming communication. In high-concurrency scenarios, creating and destroying a WebSocket connection for each request consumes significant resources and adds noticeable latency. To optimize performance and ensure stability, the DashScope software development kit (SDK) provides resource reuse mechanisms, such as connection pools and object pools. This document describes how to use these features in the DashScope Java SDK to efficiently call the Paraformer real-time speech recognition service in high-concurrency scenarios.
To use a model in the China (Beijing) region, go to the API key page for the China (Beijing) region
User guide: For model descriptions and selection guidance, see Real-time speech recognition - Fun-ASR/Paraformer.
Prerequisites
You have installed the DashScope SDK, and we recommend that you install the latest version (Java SDK version 2.16.9 or later).
The Java SDK uses a built-in connection pool and a custom object pool to deliver optimal performance.
Connection pool: The SDK integrates an OkHttp3 connection pool to manage and reuse underlying WebSocket connections. This reduces network handshake overhead. This feature is enabled by default.
Object pool: This feature is based on
commons-pool2and maintains a group ofRecognitionobjects with pre-established connections. Retrieving an object from the pool eliminates connection establishment latency and significantly reduces first-packet latency.
Implementation steps
Add dependencies
Add the dashscope-sdk-java and commons-pool2 dependencies to your project's configuration file.
The following sections provide examples for Maven and Gradle:
Maven
Open the
pom.xmlfile of your Maven project.Add the following dependencies to the
<dependencies>tag.
<dependency> <groupId>com.alibaba</groupId> <artifactId>dashscope-sdk-java</artifactId> <!-- Replace 'the-latest-version' with 2.16.9 or a later version. You can find the version number at the following URL: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java --> <version>the-latest-version</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-pool2</artifactId> <!-- Replace 'the-latest-version' with the latest version. You can find the version number at the following URL: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 --> <version>the-latest-version</version> </dependency>Save the
pom.xmlfile.Run a Maven command, such as
mvn clean installormvn compile, to update the project dependencies.
Gradle
Open the
build.gradlefile of your Gradle project.Add the following dependencies to the
dependenciesblock.dependencies { // Replace 'the-latest-version' with 2.16.9 or a later version. You can find the version number at the following URL: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java implementation group: 'com.alibaba', name: 'dashscope-sdk-java', version: 'the-latest-version' // Replace 'the-latest-version' with the latest version. You can find the version number at the following URL: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 implementation group: 'org.apache.commons', name: 'commons-pool2', version: 'the-latest-version' }Save the
build.gradlefile.In the command line, switch to the root directory of your project and run the following Gradle command to update the project dependencies.
./gradlew build --refresh-dependenciesIf you use a Windows operating system, run the following command:
gradlew build --refresh-dependencies
Configure the connection pool
You can configure key parameters for the connection pool using environment variables:
Environment variable
Description
DASHSCOPE_CONNECTION_POOL_SIZE
The connection pool size.
Recommended value: More than twice the peak concurrency.
Default value: 32.
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS
The maximum number of asynchronous requests.
Recommended value: Same as
DASHSCOPE_CONNECTION_POOL_SIZE.Default value: 32.
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST
The maximum number of asynchronous requests for a single host.
Recommended value: Set to the same value as
DASHSCOPE_CONNECTION_POOL_SIZE.Default value: 32.
Configure the object pool
You can configure the object pool size using an environment variable:
Environment variable
Description
RECOGNITION_OBJECTPOOL_SIZE
The object pool size.
Recommended value: 1.5 to 2 times the peak concurrency.
Default value: 500.
ImportantThe size of the object pool (
RECOGNITION_OBJECTPOOL_SIZE) must be less than or equal to the size of the connection pool (DASHSCOPE_CONNECTION_POOL_SIZE). Otherwise, if the connection pool is full when the object pool requests an object, the calling thread blocks until a connection becomes available.The object pool size must not exceed the queries per second (QPS) limit of your account.
You can create the object pool using the following code:
class RecognitionObjectPool { // ... other code omitted here, see the full code for the complete example public static GenericObjectPool<Recognition> getInstance() { lock.lock(); if (recognitionGenericObjectPool == null) { // You can set the object pool size here or in the RECOGNITION_OBJECTPOOL_SIZE environment variable. // We recommend that you set it to 1.5 to 2 times your server's maximum concurrent connections. int objectPoolSize = getObjectivePoolSize(); System.out.println("RECOGNITION_OBJECTPOOL_SIZE: " + objectPoolSize); RecognitionObjectFactory recognitionObjectFactory = new RecognitionObjectFactory(); GenericObjectPoolConfig<Recognition> config = new GenericObjectPoolConfig<>(); config.setMaxTotal(objectPoolSize); config.setMaxIdle(objectPoolSize); config.setMinIdle(objectPoolSize); recognitionGenericObjectPool = new GenericObjectPool<>(recognitionObjectFactory, config); } lock.unlock(); return recognitionGenericObjectPool; } }Obtain a Recognition object from the object pool
If the number of objects currently in use exceeds the maximum capacity of the object pool, the system creates a new
Recognitionobject.This new object requires re-initialization and a new WebSocket connection. It cannot use existing resources from the object pool and therefore does not benefit from reuse.
recognizer = RecognitionObjectPool.getInstance().borrowObject();Perform speech recognition
You can invoke the call or streamCall method of the
Recognitionobject to perform speech recognition.Return the
RecognitionobjectAfter the speech recognition task is complete, you must return the Recognition object so that subsequent tasks can reuse it.
Do not return objects from incomplete or failed tasks.
RecognitionObjectPool.getInstance().returnObject(recognizer);
Complete code
package org.alibaba.bailian.example.examples;
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.ApiKey;
import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.commons.pool2.impl.GenericObjectPool;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Lock;
/**
* Before making high-concurrency calls to the ASR service,
* configure the connection pool size through the following environment
* variables.
*
* DASHSCOPE_MAXIMUM_ASYNC_REQUESTS=2000
* DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST=2000
* DASHSCOPE_CONNECTION_POOL_SIZE=2000
*
* The default is 32. We recommend that you set it to twice the maximum
* concurrent connections of a single server.
*/
public class Main {
public static void checkoutEnv(String envName, int defaultSize) {
if (System.getenv(envName) != null) {
System.out.println("[ENV CHECK]: " + envName + " "
+ System.getenv(envName));
} else {
System.out.println("[ENV CHECK]: " + envName
+ " Using Default which is " + defaultSize);
}
}
public static void main(String[] args)
throws NoApiKeyException, InterruptedException {
// Check for connection pool env
checkoutEnv("DASHSCOPE_CONNECTION_POOL_SIZE", 32);
checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS", 32);
checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST", 32);
checkoutEnv(RecognitionObjectPool.RECOGNITION_OBJECTPOOL_SIZE_ENV, RecognitionObjectPool.DEFAULT_OBJECT_POOL_SIZE);
int threadNums = 3;
String currentDir = System.getProperty("user.dir");
// Replace the path with your audio source
Path[] filePaths = {
Paths.get(currentDir, "asr_example.wav"),
Paths.get(currentDir, "asr_example.wav"),
Paths.get(currentDir, "asr_example.wav"),
};
// Use ThreadPool to run recognition tasks
ExecutorService executorService = Executors.newFixedThreadPool(threadNums);
for (int i = 0; i < threadNums; i++) {
executorService.submit(new RealtimeRecognizeTask(filePaths));
}
executorService.shutdown();
// wait for all tasks to complete
executorService.awaitTermination(10, TimeUnit.MINUTES);
System.exit(0);
}
}
class RecognitionObjectFactory extends BasePooledObjectFactory<Recognition> {
public RecognitionObjectFactory() {
super();
}
@Override
public Recognition create() throws Exception {
return new Recognition();
}
@Override
public PooledObject<Recognition> wrap(Recognition obj) {
return new DefaultPooledObject<>(obj);
}
}
class RecognitionObjectPool {
public static GenericObjectPool<Recognition> recognitionGenericObjectPool;
public static String RECOGNITION_OBJECTPOOL_SIZE_ENV =
"RECOGNITION_OBJECTPOOL_SIZE";
public static int DEFAULT_OBJECT_POOL_SIZE = 500;
private static Lock lock = new java.util.concurrent.locks.ReentrantLock();
public static int getObjectivePoolSize() {
try {
Integer n = Integer.parseInt(System.getenv(RECOGNITION_OBJECTPOOL_SIZE_ENV));
return n;
} catch (NumberFormatException e) {
return DEFAULT_OBJECT_POOL_SIZE;
}
}
public static GenericObjectPool<Recognition> getInstance() {
lock.lock();
if (recognitionGenericObjectPool == null) {
// You can set the object pool size here or in the environment variable
// RECOGNITION_OBJECTPOOL_SIZE. We recommend that you set it to 1.5 to 2
// times your server's maximum concurrent connections.
int objectPoolSize = getObjectivePoolSize();
System.out.println("RECOGNITION_OBJECTPOOL_SIZE: "
+ objectPoolSize);
RecognitionObjectFactory recognitionObjectFactory =
new RecognitionObjectFactory();
GenericObjectPoolConfig<Recognition> config =
new GenericObjectPoolConfig<>();
config.setMaxTotal(objectPoolSize);
config.setMaxIdle(objectPoolSize);
config.setMinIdle(objectPoolSize);
recognitionGenericObjectPool =
new GenericObjectPool<>(recognitionObjectFactory, config);
}
lock.unlock();
return recognitionGenericObjectPool;
}
}
class RealtimeRecognizeTask implements Runnable {
private static final Object lock = new Object();
private Path[] filePaths;
public RealtimeRecognizeTask(Path[] filePaths) {
this.filePaths = filePaths;
}
/**
* Set your DashScope API key.
* If you have set DASHSCOPE_API_KEY in your environment variable, you
* can ignore this. The SDK automatically gets the API key from the
* environment variable.
* */
private static String getDashScopeApiKey() throws NoApiKeyException {
String dashScopeApiKey = null;
try {
ApiKey apiKey = new ApiKey();
dashScopeApiKey =
ApiKey.getApiKey(null); // Retrieve from environment variable.
} catch (NoApiKeyException e) {
System.out.println("No API key found in environment.");
}
if (dashScopeApiKey == null) {
// If you cannot set the API key in your environment variable,
// you can set it here in the code.
dashScopeApiKey = "your-dashscope-apikey";
}
return dashScopeApiKey;
}
public void runCallback() {
for (Path filePath : filePaths) {
// Create recognition parameters.
// You can customize the recognition parameters, such as model, format,
// and sample_rate.
RecognitionParam param = null;
try {
param =
RecognitionParam.builder()
.model("paraformer-realtime-v2")
.format(
"pcm") // 'pcm', 'wav', 'opus', 'speex', 'aac', or 'amr'.
// You can check the documentation for supported formats.
.sampleRate(16000) // Supported sample rates: 8000 and 16000.
.apiKey(getDashScopeApiKey()) // Use getDashScopeApiKey to get the
// API key.
.build();
} catch (Exception e) {
throw new RuntimeException(e);
}
Recognition recognizer = null;
// if recv onError
final boolean[] hasError = {false};
try {
recognizer = RecognitionObjectPool.getInstance().borrowObject();
String threadName = Thread.currentThread().getName();
ResultCallback<RecognitionResult> callback =
new ResultCallback<RecognitionResult>() {
@Override
public void onEvent(RecognitionResult message) {
synchronized (lock) {
if (message.isSentenceEnd()) {
System.out.println("[process " + threadName
+ "] Fix:" + message.getSentence().getText());
} else {
System.out.println("[process " + threadName
+ "] Result: " + message.getSentence().getText());
}
}
}
@Override
public void onComplete() {
System.out.println("[" + threadName + "] Recognition complete");
}
@Override
public void onError(Exception e) {
System.out.println("[" + threadName
+ "] RecognitionCallback error: " + e.getMessage());
hasError[0] = true;
}
};
// Replace the path with your audio file path.
System.out.println(
"[" + threadName + "] Input file_path is: " + filePath);
FileInputStream fis = null;
// Read the file and send audio in chunks.
try {
fis = new FileInputStream(filePath.toFile());
} catch (Exception e) {
System.out.println("Error when loading file: " + filePath);
e.printStackTrace();
}
// Set param and callback.
recognizer.call(param, callback);
// Set chunk size to 100 ms for a 16 KHz sample rate.
byte[] buffer = new byte[3200];
int bytesRead;
// Loop to read chunks of the file.
while ((bytesRead = fis.read(buffer)) != -1) {
ByteBuffer byteBuffer;
if (bytesRead < buffer.length) {
byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
} else {
byteBuffer = ByteBuffer.wrap(buffer);
}
// Send the ByteBuffer to the recognition instance.
recognizer.sendAudioFrame(byteBuffer);
Thread.sleep(100);
buffer = new byte[3200];
}
System.out.println(
"[" + threadName + "] send audio done");
recognizer.stop();
System.out.println(
"[" + threadName + "] asr task finished");
} catch (Exception e) {
e.printStackTrace();
hasError[0] = true;
}
if (recognizer != null) {
try {
if (hasError[0] == true) {
// Invalidate the recognition object if an error occurs.
recognizer.getDuplexApi().close(1000, "bye");
RecognitionObjectPool.getInstance().invalidateObject(recognizer);
} else {
// Return the recognition object to the pool if no error or exception occurs.
RecognitionObjectPool.getInstance().returnObject(recognizer);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
@Override
public void run() {
runCallback();
}
}Recommended configurations
The following configurations are based on test results from running only the Paraformer real-time speech recognition service on Alibaba Cloud servers with the specified configurations. Many concurrent requests may cause task processing delays.
The maximum concurrency of a single server is the number of Paraformer real-time speech recognition tasks that are running simultaneously. This is equivalent to the number of worker threads.
Server configuration (Alibaba Cloud) | Maximum concurrency of a single server | Object pool size | Connection pool size |
4-core 8 GiB | 100 | 500 | 2000 |
8-core 16 GiB | 200 | 500 | 2000 |
16-core 32 GiB | 400 | 500 | 2000 |
Resource management and exception handling
Task success: When a speech recognition task completes successfully, you must call the `returnObject` method of `GenericObjectPool` to return the `Recognition` object to the pool for reuse.
In the sample code, this operation is performed by
RecognitionObjectPool.getInstance().returnObject(recognizer).ImportantDo not return `Recognition` objects from incomplete or failed tasks.
Task failure: If an exception within the SDK or your business logic interrupts a task, you must perform the following two operations:
Explicitly close the underlying WebSocket connection.
Invalidate the object in the object pool to prevent it from being reused.
// This corresponds to the following content in the current code: // Close the connection. recognizer.getDuplexApi().close(1000, "bye"); // Invalidate the recognizer that caused the exception in the object pool. RecognitionObjectPool.getInstance().invalidateObject(recognizer);No additional handling is required when a `TaskFailed` error occurs.
Call warm-up and latency statistics
When you evaluate the performance of the DashScope Java SDK, for example, by measuring concurrent call latency, we recommend that you perform a full warm-up before you start the formal test. A warm-up ensures that the measurement results accurately reflect the service's true performance in a stable state. This practice avoids data drift caused by the initial connection time.
Connection reuse mechanism
The DashScope Java SDK uses a global singleton connection pool to efficiently manage and reuse WebSocket connections. This is designed to reduce the overhead of frequent connections and disconnections and to improve processing capabilities in high-concurrency scenarios.
This mechanism has the following characteristics:
On-demand creation: The SDK does not pre-create WebSocket connections at service startup. Instead, it establishes them on demand during the first call.
Time-limited reuse: After a request is completed, the connection is kept in the pool for up to 60 seconds for reuse.
If a new request arrives within 60 seconds, an existing connection is reused to avoid repeated handshake overhead.
If a connection is idle for more than 60 seconds, it is automatically closed to release resources.
Importance of warm-up
In the following scenarios, the connection pool may not have active connections available for reuse. This forces requests to create new connections:
The application has just started and has not made any calls.
The service has been idle for more than 60 seconds, and the connections in the pool have been closed due to a timeout.
In these scenarios, the first few requests trigger the full WebSocket connection process, including the TCP handshake, Transport Layer Security (TLS) negotiation, and protocol upgrade. The end-to-end latency of these requests is significantly higher than that of subsequent requests that can reuse connections. This extra time is due to network connection initialization, not the service's own processing latency. Therefore, without a warm-up, performance test results will be biased because they include the initial connection overhead.
Recommended practices
To obtain reliable performance data, you can follow these warm-up steps before you start formal performance stress testing or latency statistics collection:
Simulate the concurrency level of your formal test by making several calls in advance (for example, for 1 to 2 minutes) to fully populate the connection pool.
After you confirm that the connection pool has established and maintained enough active connections, you can start collecting formal performance data.
A proper warm-up allows the SDK connection pool to enter a stable reuse state. This lets you measure more representative latency metrics that truly reflect the service's performance during stable online operation.