All Products
Search
Document Center

Alibaba Cloud Model Studio:Performance optimization for real-time speech recognition in high-concurrency scenarios

Last Updated:Jan 08, 2026

The Paraformer real-time speech recognition service uses the WebSocket protocol for real-time streaming communication. In high-concurrency scenarios, creating and destroying a WebSocket connection for each request consumes significant resources and adds noticeable latency. To optimize performance and ensure stability, the DashScope software development kit (SDK) provides resource reuse mechanisms, such as connection pools and object pools. This document describes how to use these features in the DashScope Java SDK to efficiently call the Paraformer real-time speech recognition service in high-concurrency scenarios.

Important

To use a model in the China (Beijing) region, go to the API key page for the China (Beijing) region

User guide: For model descriptions and selection guidance, see Real-time speech recognition - Fun-ASR/Paraformer.

Prerequisites

The Java SDK uses a built-in connection pool and a custom object pool to deliver optimal performance.

  • Connection pool: The SDK integrates an OkHttp3 connection pool to manage and reuse underlying WebSocket connections. This reduces network handshake overhead. This feature is enabled by default.

  • Object pool: This feature is based on commons-pool2 and maintains a group of Recognition objects with pre-established connections. Retrieving an object from the pool eliminates connection establishment latency and significantly reduces first-packet latency.

Implementation steps

  1. Add dependencies

    Add the dashscope-sdk-java and commons-pool2 dependencies to your project's configuration file.

    The following sections provide examples for Maven and Gradle:

    Maven

    1. Open the pom.xml file of your Maven project.

    2. Add the following dependencies to the <dependencies> tag.

    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>dashscope-sdk-java</artifactId>
        <!-- Replace 'the-latest-version' with 2.16.9 or a later version. You can find the version number at the following URL: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java -->
        <version>the-latest-version</version>
    </dependency>
    
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-pool2</artifactId>
        <!-- Replace 'the-latest-version' with the latest version. You can find the version number at the following URL: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 -->
        <version>the-latest-version</version>
    </dependency>
    1. Save the pom.xml file.

    2. Run a Maven command, such as mvn clean install or mvn compile, to update the project dependencies.

    Gradle

    1. Open the build.gradle file of your Gradle project.

    2. Add the following dependencies to the dependencies block.

      dependencies {
          // Replace 'the-latest-version' with 2.16.9 or a later version. You can find the version number at the following URL: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java
          implementation group: 'com.alibaba', name: 'dashscope-sdk-java', version: 'the-latest-version'
          
          // Replace 'the-latest-version' with the latest version. You can find the version number at the following URL: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2
          implementation group: 'org.apache.commons', name: 'commons-pool2', version: 'the-latest-version'
      }
    3. Save the build.gradle file.

    4. In the command line, switch to the root directory of your project and run the following Gradle command to update the project dependencies.

      ./gradlew build --refresh-dependencies

      If you use a Windows operating system, run the following command:

      gradlew build --refresh-dependencies
  2. Configure the connection pool

    You can configure key parameters for the connection pool using environment variables:

    Environment variable

    Description

    DASHSCOPE_CONNECTION_POOL_SIZE

    The connection pool size.

    Recommended value: More than twice the peak concurrency.

    Default value: 32.

    DASHSCOPE_MAXIMUM_ASYNC_REQUESTS

    The maximum number of asynchronous requests.

    Recommended value: Same as DASHSCOPE_CONNECTION_POOL_SIZE.

    Default value: 32.

    DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST

    The maximum number of asynchronous requests for a single host.

    Recommended value: Set to the same value as DASHSCOPE_CONNECTION_POOL_SIZE.

    Default value: 32.

  3. Configure the object pool

    You can configure the object pool size using an environment variable:

    Environment variable

    Description

    RECOGNITION_OBJECTPOOL_SIZE

    The object pool size.

    Recommended value: 1.5 to 2 times the peak concurrency.

    Default value: 500.

    Important
    • The size of the object pool (RECOGNITION_OBJECTPOOL_SIZE) must be less than or equal to the size of the connection pool (DASHSCOPE_CONNECTION_POOL_SIZE). Otherwise, if the connection pool is full when the object pool requests an object, the calling thread blocks until a connection becomes available.

    • The object pool size must not exceed the queries per second (QPS) limit of your account.

    You can create the object pool using the following code:

    class RecognitionObjectPool {
        // ... other code omitted here, see the full code for the complete example
        public static GenericObjectPool<Recognition> getInstance() {
            lock.lock();
            if (recognitionGenericObjectPool == null) {
                // You can set the object pool size here or in the RECOGNITION_OBJECTPOOL_SIZE environment variable.
                // We recommend that you set it to 1.5 to 2 times your server's maximum concurrent connections.
                int objectPoolSize = getObjectivePoolSize();
                System.out.println("RECOGNITION_OBJECTPOOL_SIZE: "
                        + objectPoolSize);
                RecognitionObjectFactory recognitionObjectFactory =
                        new RecognitionObjectFactory();
                GenericObjectPoolConfig<Recognition> config =
                        new GenericObjectPoolConfig<>();
                config.setMaxTotal(objectPoolSize);
                config.setMaxIdle(objectPoolSize);
                config.setMinIdle(objectPoolSize);
                recognitionGenericObjectPool =
                        new GenericObjectPool<>(recognitionObjectFactory, config);
            }
            lock.unlock();
            return recognitionGenericObjectPool;
        }
    }
  4. Obtain a Recognition object from the object pool

    If the number of objects currently in use exceeds the maximum capacity of the object pool, the system creates a new Recognition object.

    This new object requires re-initialization and a new WebSocket connection. It cannot use existing resources from the object pool and therefore does not benefit from reuse.

    recognizer = RecognitionObjectPool.getInstance().borrowObject();
  5. Perform speech recognition

    You can invoke the call or streamCall method of the Recognition object to perform speech recognition.

  6. Return the Recognition object

    After the speech recognition task is complete, you must return the Recognition object so that subsequent tasks can reuse it.

    Do not return objects from incomplete or failed tasks.

    RecognitionObjectPool.getInstance().returnObject(recognizer);

Complete code

package org.alibaba.bailian.example.examples;

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.ApiKey;
import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.commons.pool2.impl.GenericObjectPool;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;

import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Lock;

/**
 * Before making high-concurrency calls to the ASR service,
 * configure the connection pool size through the following environment
 * variables.
 *
 * DASHSCOPE_MAXIMUM_ASYNC_REQUESTS=2000
 * DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST=2000
 * DASHSCOPE_CONNECTION_POOL_SIZE=2000
 *
 * The default is 32. We recommend that you set it to twice the maximum
 * concurrent connections of a single server.
 */
public class Main {
    public static void checkoutEnv(String envName, int defaultSize) {
        if (System.getenv(envName) != null) {
            System.out.println("[ENV CHECK]: " + envName + " "
                    + System.getenv(envName));
        } else {
            System.out.println("[ENV CHECK]: " + envName
                    + " Using Default which is " + defaultSize);
        }
    }

    public static void main(String[] args)
            throws NoApiKeyException, InterruptedException {
        // Check for connection pool env
        checkoutEnv("DASHSCOPE_CONNECTION_POOL_SIZE", 32);
        checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS", 32);
        checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST", 32);
        checkoutEnv(RecognitionObjectPool.RECOGNITION_OBJECTPOOL_SIZE_ENV, RecognitionObjectPool.DEFAULT_OBJECT_POOL_SIZE);

        int threadNums = 3;
        String currentDir = System.getProperty("user.dir");
        // Replace the path with your audio source
        Path[] filePaths = {
                Paths.get(currentDir, "asr_example.wav"),
                Paths.get(currentDir, "asr_example.wav"),
                Paths.get(currentDir, "asr_example.wav"),
        };
        // Use ThreadPool to run recognition tasks
        ExecutorService executorService = Executors.newFixedThreadPool(threadNums);
        for (int i = 0; i < threadNums; i++) {
            executorService.submit(new RealtimeRecognizeTask(filePaths));
        }
        executorService.shutdown();
        // wait for all tasks to complete
        executorService.awaitTermination(10, TimeUnit.MINUTES);
        System.exit(0);
    }
}

class RecognitionObjectFactory extends BasePooledObjectFactory<Recognition> {
    public RecognitionObjectFactory() {
        super();
    }

    @Override
    public Recognition create() throws Exception {
        return new Recognition();
    }

    @Override
    public PooledObject<Recognition> wrap(Recognition obj) {
        return new DefaultPooledObject<>(obj);
    }
}

class RecognitionObjectPool {
    public static GenericObjectPool<Recognition> recognitionGenericObjectPool;
    public static String RECOGNITION_OBJECTPOOL_SIZE_ENV =
            "RECOGNITION_OBJECTPOOL_SIZE";
    public static int DEFAULT_OBJECT_POOL_SIZE = 500;
    private static Lock lock = new java.util.concurrent.locks.ReentrantLock();

    public static int getObjectivePoolSize() {
        try {
            Integer n = Integer.parseInt(System.getenv(RECOGNITION_OBJECTPOOL_SIZE_ENV));
            return n;
        } catch (NumberFormatException e) {
            return DEFAULT_OBJECT_POOL_SIZE;
        }
    }

    public static GenericObjectPool<Recognition> getInstance() {
        lock.lock();
        if (recognitionGenericObjectPool == null) {
            // You can set the object pool size here or in the environment variable
            // RECOGNITION_OBJECTPOOL_SIZE. We recommend that you set it to 1.5 to 2
            // times your server's maximum concurrent connections.
            int objectPoolSize = getObjectivePoolSize();
            System.out.println("RECOGNITION_OBJECTPOOL_SIZE: "
                    + objectPoolSize);
            RecognitionObjectFactory recognitionObjectFactory =
                    new RecognitionObjectFactory();
            GenericObjectPoolConfig<Recognition> config =
                    new GenericObjectPoolConfig<>();
            config.setMaxTotal(objectPoolSize);
            config.setMaxIdle(objectPoolSize);
            config.setMinIdle(objectPoolSize);
            recognitionGenericObjectPool =
                    new GenericObjectPool<>(recognitionObjectFactory, config);
        }
        lock.unlock();
        return recognitionGenericObjectPool;
    }
}

class RealtimeRecognizeTask implements Runnable {
    private static final Object lock = new Object();
    private Path[] filePaths;

    public RealtimeRecognizeTask(Path[] filePaths) {
        this.filePaths = filePaths;
    }

    /**
     * Set your DashScope API key.
     * If you have set DASHSCOPE_API_KEY in your environment variable, you
     * can ignore this. The SDK automatically gets the API key from the
     * environment variable.
     * */
    private static String getDashScopeApiKey() throws NoApiKeyException {
        String dashScopeApiKey = null;
        try {
            ApiKey apiKey = new ApiKey();
            dashScopeApiKey =
                    ApiKey.getApiKey(null); // Retrieve from environment variable.
        } catch (NoApiKeyException e) {
            System.out.println("No API key found in environment.");
        }
        if (dashScopeApiKey == null) {
            // If you cannot set the API key in your environment variable,
            // you can set it here in the code.
            dashScopeApiKey = "your-dashscope-apikey";
        }
        return dashScopeApiKey;
    }

    public void runCallback() {
        for (Path filePath : filePaths) {
            // Create recognition parameters.
            // You can customize the recognition parameters, such as model, format,
            // and sample_rate.
            RecognitionParam param = null;
            try {
                param =
                        RecognitionParam.builder()
                                .model("paraformer-realtime-v2")
                                .format(
                                        "pcm") // 'pcm', 'wav', 'opus', 'speex', 'aac', or 'amr'.
                                // You can check the documentation for supported formats.
                                .sampleRate(16000) // Supported sample rates: 8000 and 16000.
                                .apiKey(getDashScopeApiKey()) // Use getDashScopeApiKey to get the
                                // API key.
                                .build();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }

            Recognition recognizer = null;
            // if recv onError
            final boolean[] hasError = {false};
            try {
                recognizer = RecognitionObjectPool.getInstance().borrowObject();

                String threadName = Thread.currentThread().getName();

                ResultCallback<RecognitionResult> callback =
                        new ResultCallback<RecognitionResult>() {
                            @Override
                            public void onEvent(RecognitionResult message) {
                                synchronized (lock) {
                                    if (message.isSentenceEnd()) {
                                        System.out.println("[process " + threadName
                                                + "] Fix:" + message.getSentence().getText());
                                    } else {
                                        System.out.println("[process " + threadName
                                                + "] Result: " + message.getSentence().getText());
                                    }
                                }
                            }

                            @Override
                            public void onComplete() {
                                System.out.println("[" + threadName + "] Recognition complete");
                            }

                            @Override
                            public void onError(Exception e) {
                                System.out.println("[" + threadName
                                        + "] RecognitionCallback error: " + e.getMessage());
                                hasError[0] = true;
                            }
                        };
                // Replace the path with your audio file path.
                System.out.println(
                        "[" + threadName + "] Input file_path is: " + filePath);
                FileInputStream fis = null;
                // Read the file and send audio in chunks.
                try {
                    fis = new FileInputStream(filePath.toFile());
                } catch (Exception e) {
                    System.out.println("Error when loading file: " + filePath);
                    e.printStackTrace();
                }
                // Set param and callback.
                recognizer.call(param, callback);

                // Set chunk size to 100 ms for a 16 KHz sample rate.
                byte[] buffer = new byte[3200];
                int bytesRead;
                // Loop to read chunks of the file.
                while ((bytesRead = fis.read(buffer)) != -1) {
                    ByteBuffer byteBuffer;
                    if (bytesRead < buffer.length) {
                        byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
                    } else {
                        byteBuffer = ByteBuffer.wrap(buffer);
                    }
                    // Send the ByteBuffer to the recognition instance.
                    recognizer.sendAudioFrame(byteBuffer);
                    Thread.sleep(100);
                    buffer = new byte[3200];
                }
                System.out.println(
                        "[" + threadName + "] send audio done");
                recognizer.stop();
                System.out.println(
                        "[" + threadName + "] asr task finished");
            } catch (Exception e) {
                e.printStackTrace();
                hasError[0] = true;
            }
            if (recognizer != null) {
                try {
                    if (hasError[0] == true) {
                        // Invalidate the recognition object if an error occurs.
                        recognizer.getDuplexApi().close(1000, "bye");
                        RecognitionObjectPool.getInstance().invalidateObject(recognizer);
                    } else {
                        // Return the recognition object to the pool if no error or exception occurs.
                        RecognitionObjectPool.getInstance().returnObject(recognizer);
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    }

    @Override
    public void run() {
        runCallback();
    }
}

Recommended configurations

The following configurations are based on test results from running only the Paraformer real-time speech recognition service on Alibaba Cloud servers with the specified configurations. Many concurrent requests may cause task processing delays.

The maximum concurrency of a single server is the number of Paraformer real-time speech recognition tasks that are running simultaneously. This is equivalent to the number of worker threads.

Server configuration (Alibaba Cloud)

Maximum concurrency of a single server

Object pool size

Connection pool size

4-core 8 GiB

100

500

2000

8-core 16 GiB

200

500

2000

16-core 32 GiB

400

500

2000

Resource management and exception handling

  • Task success: When a speech recognition task completes successfully, you must call the `returnObject` method of `GenericObjectPool` to return the `Recognition` object to the pool for reuse.

    In the sample code, this operation is performed by RecognitionObjectPool.getInstance().returnObject(recognizer).

    Important

    Do not return `Recognition` objects from incomplete or failed tasks.

  • Task failure: If an exception within the SDK or your business logic interrupts a task, you must perform the following two operations:

    1. Explicitly close the underlying WebSocket connection.

    2. Invalidate the object in the object pool to prevent it from being reused.

    // This corresponds to the following content in the current code:
    // Close the connection.
    recognizer.getDuplexApi().close(1000, "bye");
    // Invalidate the recognizer that caused the exception in the object pool.
    RecognitionObjectPool.getInstance().invalidateObject(recognizer);
  • No additional handling is required when a `TaskFailed` error occurs.

Call warm-up and latency statistics

When you evaluate the performance of the DashScope Java SDK, for example, by measuring concurrent call latency, we recommend that you perform a full warm-up before you start the formal test. A warm-up ensures that the measurement results accurately reflect the service's true performance in a stable state. This practice avoids data drift caused by the initial connection time.

Connection reuse mechanism

The DashScope Java SDK uses a global singleton connection pool to efficiently manage and reuse WebSocket connections. This is designed to reduce the overhead of frequent connections and disconnections and to improve processing capabilities in high-concurrency scenarios.

This mechanism has the following characteristics:

  • On-demand creation: The SDK does not pre-create WebSocket connections at service startup. Instead, it establishes them on demand during the first call.

  • Time-limited reuse: After a request is completed, the connection is kept in the pool for up to 60 seconds for reuse.

    • If a new request arrives within 60 seconds, an existing connection is reused to avoid repeated handshake overhead.

    • If a connection is idle for more than 60 seconds, it is automatically closed to release resources.

Importance of warm-up

In the following scenarios, the connection pool may not have active connections available for reuse. This forces requests to create new connections:

  • The application has just started and has not made any calls.

  • The service has been idle for more than 60 seconds, and the connections in the pool have been closed due to a timeout.

In these scenarios, the first few requests trigger the full WebSocket connection process, including the TCP handshake, Transport Layer Security (TLS) negotiation, and protocol upgrade. The end-to-end latency of these requests is significantly higher than that of subsequent requests that can reuse connections. This extra time is due to network connection initialization, not the service's own processing latency. Therefore, without a warm-up, performance test results will be biased because they include the initial connection overhead.

Recommended practices

To obtain reliable performance data, you can follow these warm-up steps before you start formal performance stress testing or latency statistics collection:

  1. Simulate the concurrency level of your formal test by making several calls in advance (for example, for 1 to 2 minutes) to fully populate the connection pool.

  2. After you confirm that the connection pool has established and maintained enough active connections, you can start collecting formal performance data.

A proper warm-up allows the SDK connection pool to enter a stable reuse state. This lets you measure more representative latency metrics that truly reflect the service's performance during stable online operation.

Common Java SDK exceptions

Exception 1: The service traffic is stable, but the number of TCP connections on the server continues to increase

Cause:

Type 1:

Each SDK object requests a connection upon creation. If you do not use an object pool, the object is destroyed after each task completes. This action leaves the connection in an unreferenced state, and it is disconnected only after the server-side connection timeout of 61 seconds. Consequently, the connection cannot be reused during this 61-second period.

In high-concurrency scenarios, a new task creates a new connection if no reusable connections are available. This leads to the following issues:

  1. The number of connections continues to increase.

  2. Server performance degrades because an excessive number of connections consumes available server resources.

  3. The connection pool becomes full, and new tasks are blocked while they wait for available connections.

Type 2:

The `MaxIdle` parameter of the object pool is set to a value that is smaller than the `MaxTotal` parameter. As a result, when the pool has idle objects, any objects that exceed the `MaxIdle` limit are destroyed. This process can cause connection leaks. These leaked connections are disconnected only after a 61-second timeout. Similar to the Type 1 cause, this leads to a continuous increase in the number of connections.

Solution:

For the Type 1 cause, use an object pool.

For the Type 2 cause, check the object pool configuration parameters. Set `MaxIdle` and `MaxTotal` to the same value, and disable the automatic object pool destruction policy.

Exception 2: The task takes 60 seconds longer than a normal call

The cause is the same as for Exception 1. The connection pool has reached its maximum number of connections. A new task must wait 61 seconds for an unreferenced connection to time out before the task can obtain a new connection.

Exception 3: Tasks are slow when the service starts and then gradually return to normal

Cause:

During high-concurrency calls, a single object reuses its WebSocket connection for multiple tasks. Therefore, a WebSocket connection is typically created only when the service starts. Note that if high-concurrency calls are initiated immediately during the task startup stage, creating too many WebSocket connections at the same time may cause blocking.

Solution:

Gradually increase the concurrency, or add prefetch tasks after the service starts.

Exception 4: The server reports the "Invalid action('run-task')! Please follow the protocol!" error

Cause:

When a client-side error occurs, the server is not notified, and the connection remains in a task-in-progress state. If this connection and its associated object are then reused for a new task, a protocol error occurs, which causes the new task to fail.

Solution:

After a client-side exception is thrown, you must explicitly close the WebSocket connection and then return the object to the object pool.

Exception 5: The service traffic is stable, but the call volume has abnormal spikes

Cause:

Creating too many WebSocket connections at the same time causes blocking. Because incoming service traffic continues, a short-term backlog of tasks is created. After the blocking is resolved, all backlogged tasks are called at once. This causes a spike in call volume that can momentarily exceed the concurrency limit for your Alibaba Cloud account, which can result in task failures, server performance degradation, and other issues.

Creating too many WebSocket connections at once typically occurs in the following scenarios:

  • During the service startup stage

  • A network exception occurs that causes many WebSocket connections to be interrupted and reconnected at the same time.

  • Many server-side errors occur at the same time, which leads to many WebSocket reconnections. A common error occurs when the concurrency exceeds the account limit ("Requests rate limit exceeded, please try again later.").

Solution:

  1. Check your network conditions.

  2. Check whether many other server-side errors occurred before the spike.

  3. Increase the concurrency limit for your Alibaba Cloud account.

  4. Reduce the sizes of the object pool and connection pool. You can also limit the maximum concurrency using the upper limit of the object pool.

  5. Upgrade your server configuration or increase the number of servers.

Exception 6: All tasks slow down as the concurrency increases

Solution:

  1. Check whether you have reached the network bandwidth limit.

  2. Check whether the actual concurrency is too high for your server's specifications.