High-concurrency scenarios for CosyVoice - Alibaba Cloud Model Studio

The CosyVoice speech synthesis service uses the WebSocket protocol for real-time streaming communication. In high-concurrency scenarios, creating and destroying a WebSocket connection for each request consumes significant resources and introduces connection latency. To optimize performance and ensure stability, the DashScope SDK provides resource reuse mechanisms, such as connection pools and object pools. This document describes how to use these features in the DashScope Python and Java SDKs to efficiently call the CosyVoice service in high-concurrency scenarios.

Important

To use a model in the China (Beijing) region, go to the API key page for the China (Beijing) region

Prerequisites

Create an API key.
You have installed a compatible version of the DashScope SDK: install the latest version
- Python SDK: Version 1.25.2 or later
- Java SDK: Version 2.16.6 or later

Python SDK: Object pool optimization

The Python SDK uses the SpeechSynthesizerObjectPool class to manage and reuse SpeechSynthesizer objects through object pool optimization.

When the object pool is initialized, it creates a specified number of SpeechSynthesizer instances and establishes WebSocket connections in advance. When you retrieve an object from the pool, you can send a request directly without waiting for a connection to be established. This practice effectively reduces first-packet latency. When a task is complete and the object is returned to the pool, its WebSocket connection remains active and ready for the next task.

Implementation steps

Install dependencies: Install the DashScope dependency by running pip install -U dashscope.
Create and configure the object pool.
Set the object pool size using SpeechSynthesizerObjectPool. We recommend setting this value to 1.5 to 2 times your peak concurrency. The object pool size must not exceed the queries per second (QPS) limit for your account.
Use the following code to create a global singleton object pool with a fixed size. When the object pool is initialized, it creates the specified number of SpeechSynthesizer objects and establishes WebSocket connections. This process takes some time.
```
from dashscope.audio.tts_v2 import SpeechSynthesizerObjectPool

synthesizer_object_pool = SpeechSynthesizerObjectPool(max_size=20)
```
Retrieve a SpeechSynthesizer object from the object pool.
If the number of objects currently in use exceeds the maximum capacity of the pool, the system creates a new SpeechSynthesizer object.
This new object must be re-initialized and must establish a new WebSocket connection. It cannot use existing connection resources from the pool and therefore does not benefit from resource reuse.
```
speech_synthesizer = connectionPool.borrow_synthesizer(
    model='cosyvoice-v3-flash',
    voice='longanyang',
    seed=12382,
    callback=synthesizer_callback
)
```
Perform speech synthesis.
Call the `call` or `streaming_call` method of the SpeechSynthesizer object to perform speech synthesis.
Return the SpeechSynthesizer object.
After the speech synthesis task is complete, return the SpeechSynthesizer object so that it can be reused by subsequent tasks.
Do not return objects from incomplete or failed tasks.
```
connectionPool.return_synthesizer(speech_synthesizer)
```

Complete code

# !/usr/bin/env python3
# Copyright (C) Alibaba Group. All Rights Reserved.
# MIT License (https://opensource.org/licenses/MIT)

import os
import time
import threading

import dashscope
from dashscope.audio.tts_v2 import *


USE_CONNECTION_POOL = True
text_to_synthesize = [
    'Sentence 1: Welcome to the Alibaba Cloud speech synthesis service.',
    'Sentence 2: Welcome to the Alibaba Cloud speech synthesis service.',
    'Sentence 3: Welcome to the Alibaba Cloud speech synthesis service.',
]
connectionPool = None
if USE_CONNECTION_POOL:
    print('creating connection pool')
    start_time = time.time() * 1000
    connectionPool = SpeechSynthesizerObjectPool(max_size=3)
    end_time = time.time() * 1000
    print('connection pool created, cost: {} ms'.format(end_time - start_time))

def init_dashscope_api_key():
    '''
    Set your DashScope API key. For more information, see
    https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
    '''
    if 'DASHSCOPE_API_KEY' in os.environ:
        dashscope.api_key = os.environ[
            'DASHSCOPE_API_KEY']  # load the API key from the DASHSCOPE_API_KEY environment variable
    else:
        dashscope.api_key = '<your-dashscope-api-key>'  # set the API key manually


def synthesis_text_to_speech_and_play_by_streaming_mode(text, task_id):
    global USE_CONNECTION_POOL, connectionPool
    '''
    Synthesize speech with given text by streaming mode, async call and play the synthesized audio in real-time.
    For more information, see https://www.alibabacloud.com/help/document_detail/2712523.html
    '''

    complete_event = threading.Event()

    # Define a callback to handle the result

    class Callback(ResultCallback):
        def on_open(self):
            # When using an object pool, on_open is called after the task starts.
            self.file = open(f'result_{task_id}.mp3', 'wb')
            print(f'[task_{task_id}] start')

        def on_complete(self):
            print(f'[task_{task_id}] speech synthesis task completed successfully.')
            complete_event.set()

        def on_error(self, message: str):
            print(f'[task_{task_id}] speech synthesis task failed, {message}')

        def on_close(self):
            # When using an object pool, on_close is called after the task is finished.
            print(f'[task_{task_id}] finished')

        def on_event(self, message):
            # print(f'recv speech synthsis message {message}')
            pass

        def on_data(self, data: bytes) -> None:
            # send to player
            # save audio to file
            self.file.write(data)

    # Call the speech synthesizer callback
    synthesizer_callback = Callback()

    # Initialize the speech synthesizer
    # you can customize the synthesis parameters, such as voice, format, sample_rate, or other parameters
    if USE_CONNECTION_POOL:
        speech_synthesizer = connectionPool.borrow_synthesizer(
            model='cosyvoice-v3-flash',
            voice='longanyang',
            seed=12382,
            callback=synthesizer_callback
        )
    else:
        speech_synthesizer = SpeechSynthesizer(model='cosyvoice-v3-flash',
                                               voice='longanyang',
                                               seed=12382,
                                               callback=synthesizer_callback)
    try:
        speech_synthesizer.call(text)
    except Exception as e:
        print(f'[task_{task_id}] speech synthesis task failed, {e}')
        if USE_CONNECTION_POOL:
            # If the task fails when you use a connection pool, close the synthesizer connection manually.
            speech_synthesizer.close()
        return

    print('[task_{}] Synthesized text: {}'.format(task_id, text))
    complete_event.wait()
    print('[task_{}][Metric] requestId: {}, first package delay ms: {}'.format(
        task_id,
        speech_synthesizer.get_last_request_id(),
        speech_synthesizer.get_first_package_delay()))
    if USE_CONNECTION_POOL:
        connectionPool.return_synthesizer(speech_synthesizer)


# main function
if __name__ == '__main__':
    init_dashscope_api_key()
    task_thread_list = []
    for task_id in range(3):
        thread = threading.Thread(
            target=synthesis_text_to_speech_and_play_by_streaming_mode,
            args=(text_to_synthesize[task_id], task_id))
        task_thread_list.append(thread)

    for task_thread in task_thread_list:
        task_thread.start()

    for task_thread in task_thread_list:
        task_thread.join()

    if USE_CONNECTION_POOL:
        connectionPool.shutdown()

Resource management and troubleshooting

Task success: When a speech synthesis task completes successfully, you must call connectionPool.return_synthesizer(speech_synthesizer) to return the SpeechSynthesizer object to the pool for reuse.
Important
Do not return SpeechSynthesizer objects from incomplete or failed tasks.
Task failure: If a task is interrupted by an internal SDK exception or a business logic exception, you must manually close the underlying WebSocket connection by calling speech_synthesizer.close().
After all speech synthesis tasks are complete, close the object pool by calling connectionPool.shutdown().
If a `TaskFailed` error occurs, no extra handling is required.

Java SDK: Connection and object pool optimization

The Java SDK uses a built-in connection pool and a custom object pool that work together to deliver optimal performance.

Connection pool: The SDK integrates an OkHttp3 connection pool to manage and reuse underlying WebSocket connections. This reduces network handshake overhead. This feature is enabled by default.
Object pool: Implemented based on commons-pool2, this pool maintains a set of SpeechSynthesizer objects with pre-established connections. Retrieving an object from the pool eliminates connection setup latency and significantly reduces first-packet latency.

Implementation steps

Add dependencies.

Based on your project's build tool, add `dashscope-sdk-java` and `commons-pool2` to your dependency configuration file.

The following sections show the configurations for Maven and Gradle.

Maven

Open the pom.xml file of your Maven project.
Add the following dependency information within the <dependencies> tag.

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>dashscope-sdk-java</artifactId>
    <!-- Replace 'the-latest-version' with version 2.16.9 or later. You can find the relevant version numbers at: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java -->
    <version>the-latest-version</version>
</dependency>

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-pool2</artifactId>
    <!-- Replace 'the-latest-version' with the latest version. You can find the relevant version numbers at: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 -->
    <version>the-latest-version</version>
</dependency>

Save the pom.xml file.
Use a Maven command, such as mvn clean install or mvn compile, to update the project dependencies.

Gradle

Open the build.gradle file of your Gradle project.

Add the following dependency information within the dependencies block.

dependencies {
    // Replace 'the-latest-version' with version 2.16.6 or later. You can find the relevant version numbers at: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java
    implementation group: 'com.alibaba', name: 'dashscope-sdk-java', version: 'the-latest-version'
    
    // Replace 'the-latest-version' with the latest version. You can find the relevant version numbers at: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2
    implementation group: 'org.apache.commons', name: 'commons-pool2', version: 'the-latest-version'
}

Save the build.gradle file.
In the command line, navigate to your project's root directory and run the following Gradle command to update the project dependencies.
```
./gradlew build --refresh-dependencies
```
If you use Windows, run the following command:
```
gradlew build --refresh-dependencies
```

Configure the connection pool.

You can configure key connection pool parameters using environment variables:

Environment variable	Description
DASHSCOPE_CONNECTION_POOL_SIZE	The size of the connection pool. Recommended value: At least twice your peak concurrency. Default value: 32.
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS	The maximum number of asynchronous requests. Recommended value: Same as `DASHSCOPE_CONNECTION_POOL_SIZE`. Default value: 32.
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST	The maximum number of asynchronous requests per host. Recommended value: Same as `DASHSCOPE_CONNECTION_POOL_SIZE`. Default value: 32.

Configure the object pool.

You can configure the object pool size using an environment variable:

Environment variable

Description

COSYVOICE_OBJECTPOOL_SIZE

The object pool size.

Recommended value: 1.5 to 2 times your peak concurrency.

Default value: 500.

Important

The object pool size (COSYVOICE_OBJECTPOOL_SIZE) must be less than or equal to the connection pool size (DASHSCOPE_CONNECTION_POOL_SIZE). Otherwise, if the connection pool is full when the object pool requests an object, the calling thread is blocked until a connection becomes available.
The object pool size must not exceed the QPS limit for your account.

Use the following code to create the object pool:

class CosyvoiceObjectPool {
    // ... Other code is omitted here. For the complete example, see the full code.
    public static GenericObjectPool<SpeechSynthesizer> getInstance() {
        lock.lock();
        if (synthesizerPool == null) {
            // You can set the object pool size here or in the COSYVOICE_OBJECTPOOL_SIZE environment variable.
            // Set it to 1.5 to 2 times the maximum concurrent connections of the server.
            int objectPoolSize = getObjectivePoolSize();
            SpeechSynthesizerObjectFactory speechSynthesizerObjectFactory =
                    new SpeechSynthesizerObjectFactory();
            GenericObjectPoolConfig<SpeechSynthesizer> config =
                    new GenericObjectPoolConfig<>();
            config.setMaxTotal(objectPoolSize);
            config.setMaxIdle(objectPoolSize);
            config.setMinIdle(objectPoolSize);
            synthesizerPool =
                    new GenericObjectPool<>(speechSynthesizerObjectFactory, config);
        }
        lock.unlock();
        return synthesizerPool;
    }
}

Retrieve a SpeechSynthesizer object from the object pool.
If the number of objects currently in use exceeds the maximum capacity of the pool, the system creates a new SpeechSynthesizer object.
This new object must be re-initialized and must establish a new WebSocket connection. It cannot use existing connection resources from the pool and therefore does not benefit from resource reuse.
```
synthesizer = CosyvoiceObjectPool.getInstance().borrowObject();
```
Perform speech synthesis.
Call the `call` or `streamingCall` method of the SpeechSynthesizer object to perform speech synthesis.
Return the SpeechSynthesizer object.
After the speech synthesis task is complete, return the SpeechSynthesizer object so that it can be reused by subsequent tasks.
Do not return objects from incomplete or failed tasks.
```
CosyvoiceObjectPool.getInstance().returnObject(synthesizer);
```

Complete code

package org.alibaba.bailian.example.examples;

import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.commons.pool2.impl.GenericObjectPool;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;

import java.time.LocalDateTime;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Lock;

/**
 * You need to import the org.apache.commons.pool2 and DashScope packages into your project.
 *
 * DashScope SDK versions 2.16.6 and later are optimized for high-concurrency scenarios.
 * Versions earlier than DashScope SDK 2.16.6 are not recommended for use in high-concurrency scenarios.
 *
 *
 * Before making high-concurrency calls to the text-to-speech (TTS) service,
 * configure the connection pool parameters using the following environment variables.
 *
 * DASHSCOPE_MAXIMUM_ASYNC_REQUESTS
 * DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST
 * DASHSCOPE_CONNECTION_POOL_SIZE
 *
 */

class SpeechSynthesizerObjectFactory
        extends BasePooledObjectFactory<SpeechSynthesizer> {
    public SpeechSynthesizerObjectFactory() {
        super();
    }
    @Override
    public SpeechSynthesizer create() throws Exception {
        return new SpeechSynthesizer();
    }

    @Override
    public PooledObject<SpeechSynthesizer> wrap(SpeechSynthesizer obj) {
        return new DefaultPooledObject<>(obj);
    }
}

class CosyvoiceObjectPool {
    public static GenericObjectPool<SpeechSynthesizer> synthesizerPool;
    public static String COSYVOICE_OBJECTPOOL_SIZE_ENV = "COSYVOICE_OBJECTPOOL_SIZE";
    public static int DEFAULT_OBJECT_POOL_SIZE = 500;
    private static Lock lock = new java.util.concurrent.locks.ReentrantLock();
    public static int getObjectivePoolSize() {
        try {
            Integer n = Integer.parseInt(System.getenv(COSYVOICE_OBJECTPOOL_SIZE_ENV));
            System.out.println("Using Object Pool Size In Env: "+ n);
            return n;
        } catch (NumberFormatException e) {
            System.out.println("Using Default Object Pool Size: "+ DEFAULT_OBJECT_POOL_SIZE);
            return DEFAULT_OBJECT_POOL_SIZE;
        }
    }
    public static GenericObjectPool<SpeechSynthesizer> getInstance() {
        lock.lock();
        if (synthesizerPool == null) {
            // You can set the object pool size here or in the COSYVOICE_OBJECTPOOL_SIZE environment variable.
            // Set it to 1.5 to 2 times the maximum concurrent connections of the server.
            int objectPoolSize = getObjectivePoolSize();
            SpeechSynthesizerObjectFactory speechSynthesizerObjectFactory =
                    new SpeechSynthesizerObjectFactory();
            GenericObjectPoolConfig<SpeechSynthesizer> config =
                    new GenericObjectPoolConfig<>();
            config.setMaxTotal(objectPoolSize);
            config.setMaxIdle(objectPoolSize);
            config.setMinIdle(objectPoolSize);
            synthesizerPool =
                    new GenericObjectPool<>(speechSynthesizerObjectFactory, config);
        }
        lock.unlock();
        return synthesizerPool;
    }
}

class SynthesizeTaskWithCallback implements Runnable {
    String[] textArray;
    String requestId;
    long timeCost;
    public SynthesizeTaskWithCallback(String[] textArray) {
        this.textArray = textArray;
    }
    @Override
    public void run() {
        SpeechSynthesizer synthesizer = null;
        long startTime = System.currentTimeMillis();
        // if onError is received
        final boolean[] hasError = {false};
        try {
            class ReactCallback extends ResultCallback<SpeechSynthesisResult> {
                ReactCallback() {}

                @Override
                public void onEvent(SpeechSynthesisResult message) {
                    if (message.getAudioFrame() != null) {
                        try {
                            byte[] bytesArray = message.getAudioFrame().array();
                            System.out.println("Received audio. Audio file stream length: " + bytesArray.length);
                        } catch (Exception e) {
                            throw new RuntimeException(e);
                        }
                    }
                }

                @Override
                public void onComplete() {}

                @Override
                public void onError(Exception e) {
                    System.out.println(e.getMessage());
                    e.printStackTrace();
                    hasError[0] = true;
                }
            }

            // Replace your-dashscope-api-key with your API key.
            String dashScopeApiKey = "your-dashscope-api-key";

            SpeechSynthesisParam param =
                    SpeechSynthesisParam.builder()
                            .model("cosyvoice-v1")
                            .voice("longxiaochun")
                            .format(SpeechSynthesisAudioFormat
                                    .MP3_22050HZ_MONO_256KBPS) // Use PCM or MP3 for streaming synthesis.
                            .apiKey(dashScopeApiKey)
                            .build();

            try {
                synthesizer = CosyvoiceObjectPool.getInstance().borrowObject();
                synthesizer.updateParamAndCallback(param, new ReactCallback());
                for (String text : textArray) {
                    synthesizer.streamingCall(text);
                }
                Thread.sleep(20);
                synthesizer.streamingComplete(60000);
                requestId = synthesizer.getLastRequestId();
            } catch (Exception e) {
                System.out.println("Exception e: " + e.toString());
                hasError[0] = true;
            }
        } catch (Exception e) {
            hasError[0] = true;
            throw new RuntimeException(e);
        }
        if (synthesizer != null) {
            try {
                if (hasError[0] == true) {
                    // If an exception occurs, close the connection and invalidate the object in the pool.
                    synthesizer.getDuplexApi().close(1000, "bye");
                    CosyvoiceObjectPool.getInstance().invalidateObject(synthesizer);
                } else {
                    // If the task finishes normally, return the object.
                    CosyvoiceObjectPool.getInstance().returnObject(synthesizer);
                }
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
            long endTime = System.currentTimeMillis();
            timeCost = endTime - startTime;
            System.out.println("[Thread " + Thread.currentThread() + "] Speech synthesis task finished. Time cost: " + timeCost + " ms, RequestId: " + requestId);
        }
    }
}

@Slf4j
public class SynthesizeTextToSpeechWithCallbackConcurrently {
    public static void checkoutEnv(String envName, int defaultSize) {
        if (System.getenv(envName) != null) {
            System.out.println("[ENV CHECK]: " + envName + " "
                    + System.getenv(envName));
        } else {
            System.out.println("[ENV CHECK]: " + envName
                    + " Using Default which is " + defaultSize);
        }
    }

    public static void main(String[] args)
            throws InterruptedException, NoApiKeyException {
        // Check for connection pool environment variables
        checkoutEnv("DASHSCOPE_CONNECTION_POOL_SIZE", 32);
        checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS", 32);
        checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST", 32);
        checkoutEnv(CosyvoiceObjectPool.COSYVOICE_OBJECTPOOL_SIZE_ENV, CosyvoiceObjectPool.DEFAULT_OBJECT_POOL_SIZE);

        int runTimes = 3;
        // Create the pool of SpeechSynthesis objects
        ExecutorService executorService = Executors.newFixedThreadPool(runTimes);

        for (int i = 0; i < runTimes; i++) {
            // Record the task submission time
            LocalDateTime submissionTime = LocalDateTime.now();
            executorService.submit(new SynthesizeTaskWithCallback(new String[] {
                    "Moonlight before my bed,", "Is it frost upon the ground?", "I lift my head to see the moon,", "I drop my head and think of home."}));
        }

        // Shut down the ExecutorService and wait for all tasks to complete
        executorService.shutdown();
        executorService.awaitTermination(1, TimeUnit.MINUTES);
        System.exit(0);
    }
}

Recommended configurations

The following configurations are based on test results from running the CosyVoice speech synthesis service on Alibaba Cloud servers with the specified configurations. High concurrency may cause task processing delays.

Concurrency per server is the number of CosyVoice speech synthesis tasks that can run simultaneously. This value is equivalent to the number of worker threads.

Server configuration (Alibaba Cloud)	Maximum concurrency per server	Object pool size	Connection pool size
4 cores 8 GiB	100	500	2000
8 cores 16 GiB	150	500	2000
16 cores 32 GiB	200	500	2000

Resource management and troubleshooting

Task success: When a speech synthesis task completes successfully, you must call the `returnObject` method of `GenericObjectPool` to return the SpeechSynthesizer object to the pool for reuse.
In the provided code example, this is done by calling CosyvoiceObjectPool.getInstance().returnObject(synthesizer).
Important
Do not return SpeechSynthesizer objects from incomplete or failed tasks.
Task failure: If a task is interrupted by an internal SDK exception or a business logic exception, you must perform the following two operations:
1. Manually close the underlying WebSocket connection.
2. Invalidate the object in the object pool to prevent it from being used again.
```
// This corresponds to the following content in the current code
// Close the connection
synthesizer.getDuplexApi().close(1000, "bye");
// Invalidate the problematic synthesizer in the object pool
CosyvoiceObjectPool.getInstance().invalidateObject(synthesizer);
```
If a `TaskFailed` error occurs, no extra handling is required.

Common Java SDK exceptions

Exception 1: The service traffic is stable, but the number of TCP connections on the server continues to increase

Cause:

Type 1:

Each SDK object requests a connection when it is created. If you do not use an object pool, the object is destroyed after each task is completed. In this case, the connection enters an unreferenced state and is disconnected only when the server connection times out after 61 seconds. As a result, the connection cannot be reused for 61 seconds.

In high-concurrency scenarios, a new task creates a new connection if no reusable connections are available. This has the following consequences:

The number of connections continues to increase.
The server experiences performance degradation due to insufficient server resources caused by an excessive number of connections.
The connection pool is full, and new tasks are blocked because they need to wait for available connections.

Type 2:

The MaxIdle parameter of the object pool is set to a value smaller than the MaxTotal parameter. As a result, when objects are idle, objects that exceed the MaxIdle limit are destroyed, which causes connection leaks. The leaked connections are disconnected after a timeout of 61 seconds. Similar to Type 1, this causes the number of connections to increase continuously.

Solution:

For Type 1, use an object pool.

For Type 2, check the object pool configuration parameters, set MaxIdle and MaxTotal to the same value, and disable the automatic object pool destruction policy.

Exception 2: The task takes 60 seconds longer than a normal call

This is the same as Exception 1. The connection pool has reached the maximum number of connections. A new task must wait 61 seconds for an unreferenced connection to time out before a new connection can be obtained.

Exception 3: Tasks are slow when the service starts and then gradually return to normal

Cause:

During high-concurrency calls, the same object reuses the same WebSocket connection. Therefore, the WebSocket connection is created only when the service starts. Note that if high-concurrency calls are initiated immediately during the task startup stage, creating too many WebSocket connections simultaneously may cause blocking.

Solution:

Gradually increase the concurrency or add prefetch tasks after the service starts.

Exception 4: The server reports the "Invalid action('run-task')! Please follow the protocol!" error

Cause:

After a client error occurs, the server is not aware of the error, and the connection remains in the task-in-progress state. If the connection and object are reused to start the next task, this leads to a protocol error and causes the next task to fail.

Solution:

After an exception is thrown, explicitly close the WebSocket connection and then return the object to the object pool.

Exception 5: The service traffic is stable, but the call volume has abnormal spikes

Cause:

Creating too many WebSocket connections simultaneously causes blocking. However, incoming service traffic continues, which leads to a short-term backlog of tasks. After the blocking is resolved, all backlogged tasks are called immediately. This causes spikes in call volume and may momentarily exceed the concurrency limit of the Alibaba Cloud account, resulting in task failures, server performance degradation, and other issues.

This situation of creating too many WebSocket connections instantaneously often occurs in the following scenarios:

The service startup stage
A network exception occurs, which causes many WebSocket connections to be interrupted and reconnected simultaneously.
Many server-side errors occur at a certain point in time, leading to many WebSocket reconnections. A common error is that the concurrency exceeds the account limit ("Requests rate limit exceeded, please try again later.").

Solution:

Check the network conditions.
Check whether many other server-side errors occurred before the spike.
Increase the concurrency limit of your Alibaba Cloud account.
Reduce the sizes of the object pool and connection pool, and limit the maximum concurrency using the upper limit of the object pool.
Upgrade the server configuration or increase the number of servers.

Exception 6: All tasks slow down as the concurrency increases

Solution:

Check whether the network bandwidth limit has been reached.
Check whether the actual concurrency is too high.