CosyVoice uses WebSocket for real-time audio streaming. Creating a connection for every request adds latency in high-concurrency scenarios. The DashScope SDK provides connection pools and object pools that reuse connections, eliminating setup costs and reducing first-packet latency.
Prerequisites
-
-
Python SDK: version 1.25.2 or later
-
Java SDK: version 2.16.6 or later
-
Python SDK: Object pool optimization
The Python SDK uses SpeechSynthesizerObjectPool to manage and reuse SpeechSynthesizer objects. The pool creates a specified number of instances with pre-established WebSocket connections. Borrowing an object returns one with an active connection for immediate use. Returning the object keeps its connection open for reuse.
Implementation steps
-
Install dependencies.
pip install -U dashscope -
Create and configure the object pool. Set pool size to 1.5x to 2x your peak concurrency (do not exceed your QPS limit). The code below creates a global singleton pool that initializes
SpeechSynthesizerobjects and establishes connections (this takes time).from dashscope.audio.tts_v2 import SpeechSynthesizerObjectPool synthesizer_object_pool = SpeechSynthesizerObjectPool(max_size=20) -
Borrow a
SpeechSynthesizerobject from the pool. If borrowing exceeds the pool's maximum size, the system creates a newSpeechSynthesizerobject that does not benefit from pooling.speech_synthesizer = connectionPool.borrow_synthesizer( model='cosyvoice-v3-flash', voice='longanyang', seed=12382, callback=synthesizer_callback ) -
Perform speech synthesis. Call the
callorstreaming_callmethod of theSpeechSynthesizerobject to synthesize speech. -
Return the
SpeechSynthesizerobject. After the speech synthesis task completes, return theSpeechSynthesizerobject for reuse by later tasks.ImportantDo not return objects from incomplete or failed tasks.
connectionPool.return_synthesizer(speech_synthesizer)
Complete code
# !/usr/bin/env python3
# Copyright (C) Alibaba Group. All Rights Reserved.
# MIT License (https://opensource.org/licenses/MIT)
import os
import time
import threading
import dashscope
from dashscope.audio.tts_v2 import *
USE_CONNECTION_POOL = True
text_to_synthesize = [
'First sentence: Welcome to Alibaba Cloud speech synthesis.',
'Second sentence: Welcome to Alibaba Cloud speech synthesis.',
'Third sentence: Welcome to Alibaba Cloud speech synthesis.',
]
connectionPool = None
if USE_CONNECTION_POOL:
print('creating connection pool')
start_time = time.time() * 1000
connectionPool = SpeechSynthesizerObjectPool(max_size=3)
end_time = time.time() * 1000
print('connection pool created, cost: {} ms'.format(end_time - start_time))
def init_dashscope_api_key():
'''
Set your DashScope API-key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
'''
# API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load API-key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = '<your-dashscope-api-key>' # set API-key manually
def synthesis_text_to_speech_and_play_by_streaming_mode(text, task_id):
global USE_CONNECTION_POOL, connectionPool
'''
Synthesize speech with given text by streaming mode, async call and play the synthesized audio in real-time.
for more information, please refer to https://www.alibabacloud.com/help/document_detail/2712523.html
'''
complete_event = threading.Event()
# Define a callback to handle the result
class Callback(ResultCallback):
def on_open(self):
# when using object pool, on_open will be called after task start
self.file = open(f'result_{task_id}.mp3', 'wb')
print(f'[task_{task_id}] start')
def on_complete(self):
print(f'[task_{task_id}] speech synthesis task complete successfully.')
complete_event.set()
def on_error(self, message: str):
print(f'[task_{task_id}] speech synthesis task failed, {message}')
def on_close(self):
# when using object pool, on_open will be called after task finished
print(f'[task_{task_id}] finished')
def on_event(self, message):
# print(f'recv speech synthsis message {message}')
pass
def on_data(self, data: bytes) -> None:
# send to player
# save audio to file
self.file.write(data)
# Call the speech synthesizer callback
synthesizer_callback = Callback()
# Initialize the speech synthesizer
# you can customize the synthesis parameters, like voice, format, sample_rate or other parameters
if USE_CONNECTION_POOL:
speech_synthesizer = connectionPool.borrow_synthesizer(
model='cosyvoice-v3-flash',
voice='longanyang',
seed=12382,
callback=synthesizer_callback
)
else:
speech_synthesizer = SpeechSynthesizer(model='cosyvoice-v3-flash',
voice='longanyang',
seed=12382,
callback=synthesizer_callback)
try:
speech_synthesizer.call(text)
except Exception as e:
print(f'[task_{task_id}] speech synthesis task failed, {e}')
if USE_CONNECTION_POOL:
# close the synthesizer connection manually if task failed when using connection pool.
speech_synthesizer.close()
return
print('[task_{}] Synthesized text: {}'.format(task_id, text))
complete_event.wait()
print('[task_{}][Metric] requestId: {}, first package delay ms: {}'.format(
task_id,
speech_synthesizer.get_last_request_id(),
speech_synthesizer.get_first_package_delay()))
if USE_CONNECTION_POOL:
connectionPool.return_synthesizer(speech_synthesizer)
# main function
if __name__ == '__main__':
# The following URL is for the Singapore region. If you use models in the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'
init_dashscope_api_key()
task_thread_list = []
for task_id in range(3):
thread = threading.Thread(
target=synthesis_text_to_speech_and_play_by_streaming_mode,
args=(text_to_synthesize[task_id], task_id))
task_thread_list.append(thread)
for task_thread in task_thread_list:
task_thread.start()
for task_thread in task_thread_list:
task_thread.join()
if USE_CONNECTION_POOL:
connectionPool.shutdown()
Resource management and error handling
-
Successful task: Call
connectionPool.return_synthesizer(speech_synthesizer)to return the object for reuse.ImportantDo not return
SpeechSynthesizerobjects from incomplete or failed tasks. -
Failed task: If an SDK exception or business logic error interrupts the task, close the connection manually:
speech_synthesizer.close(). -
After all speech synthesis tasks complete, shut down the object pool by calling
connectionPool.shutdown(). -
No extra action is needed when the service returns a TaskFailed error.
Java SDK: Connection pool and object pool optimization
The Java SDK combines an internal connection pool with a custom object pool for optimal performance:
-
Connection pool: The SDK uses OkHttp3's connection pool to manage and reuse WebSocket connections, reducing handshake overhead (enabled by default).
-
Object pool: Built on
commons-pool2, this maintainsSpeechSynthesizerobjects with pre-established connections. Borrowing eliminates connection setup latency and reduces first-packet latency.
Implementation steps
-
Add dependencies. Add dashscope-sdk-java and commons-pool2 to your project's dependency configuration. Examples:
Maven
-
Open your Maven project’s
pom.xmlfile. -
Add the following dependencies inside the
<dependencies>tag.
<dependency> <groupId>com.alibaba</groupId> <artifactId>dashscope-sdk-java</artifactId> <!-- Replace 'the-latest-version' with version 2.16.9 or later. Check available versions at: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java --> <version>the-latest-version</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-pool2</artifactId> <!-- Replace 'the-latest-version' with the latest version. Check available versions at: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 --> <version>the-latest-version</version> </dependency>-
Save the
pom.xmlfile. -
Run a Maven command such as
mvn clean installormvn compileto update project dependencies.
Gradle
-
Open your Gradle project’s
build.gradlefile. -
Add the following dependencies inside the
dependenciesblock.dependencies { // Replace 'the-latest-version' with version 2.16.6 or later. Check available versions at: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java implementation group: 'com.alibaba', name: 'dashscope-sdk-java', version: 'the-latest-version' // Replace 'the-latest-version' with the latest version. Check available versions at: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 implementation group: 'org.apache.commons', name: 'commons-pool2', version: 'the-latest-version' } -
Save the
build.gradlefile. -
In your terminal, navigate to your project root directory and run the following Gradle command to update dependencies.
./gradlew build --refresh-dependenciesOr, if you are using Windows, run:
gradlew build --refresh-dependencies
-
-
Configure the connection pool. Set connection pool parameters via environment variables:
Environment variable
Description
Default
DASHSCOPE_CONNECTION_POOL_SIZEConnection pool size. Set to 2x+ your peak concurrency.
32
DASHSCOPE_MAXIMUM_ASYNC_REQUESTSMaximum async requests. Match
DASHSCOPE_CONNECTION_POOL_SIZE.32
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOSTMaximum async requests per host. Match
DASHSCOPE_CONNECTION_POOL_SIZE.32
-
Configure the object pool. Set the object pool size using an environment variable. Use the following code to create the object pool:
ImportantThe object pool size (
COSYVOICE_OBJECTPOOL_SIZE) must be less than or equal to the connection pool size (DASHSCOPE_CONNECTION_POOL_SIZE). If the connection pool is full when borrowing, the calling thread blocks until a connection is available. The object pool size must not exceed your QPS limit.Environment variable
Description
Default
COSYVOICE_OBJECTPOOL_SIZEObject pool size. Set to 1.5x to 2x your peak concurrency.
500
class CosyvoiceObjectPool { // ... Other code omitted. See full example below. public static GenericObjectPool<SpeechSynthesizer> getInstance() { lock.lock(); if (synthesizerPool == null) { // You can set the object pool size here. Or set it via the COSYVOICE_OBJECTPOOL_SIZE environment variable. // Recommended value: 1.5x to 2x your server's maximum concurrent connections. int objectPoolSize = getObjectivePoolSize(); SpeechSynthesizerObjectFactory speechSynthesizerObjectFactory = new SpeechSynthesizerObjectFactory(); GenericObjectPoolConfig<SpeechSynthesizer> config = new GenericObjectPoolConfig<>(); config.setMaxTotal(objectPoolSize); config.setMaxIdle(objectPoolSize); config.setMinIdle(objectPoolSize); synthesizerPool = new GenericObjectPool<>(speechSynthesizerObjectFactory, config); } lock.unlock(); return synthesizerPool; } } -
Borrow a
SpeechSynthesizerobject from the pool. If the number of borrowed objects exceeds the pool's maximum size, the system creates a newSpeechSynthesizerobject. This newly created object must initialize and establish a WebSocket connection without reusing existing pool connections, so it does not benefit from pooling.synthesizer = CosyvoiceObjectPool.getInstance().borrowObject(); -
Perform speech synthesis. Call the
callorstreamingCallmethod of theSpeechSynthesizerobject to perform speech synthesis. -
Return the
SpeechSynthesizerobject. After the speech synthesis task completes, return theSpeechSynthesizerobject for reuse by later tasks.ImportantDo not return objects from incomplete or failed tasks.
CosyvoiceObjectPool.getInstance().returnObject(synthesizer);
Complete code
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.commons.pool2.impl.GenericObjectPool;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
import java.time.LocalDateTime;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Lock;
/**
* Your project must include org.apache.commons.pool2 and DashScope packages.
*
* DashScope SDK version 2.16.6 and later are optimized for high-concurrency scenarios.
* DashScope SDK versions earlier than 2.16.6 are not recommended for high-concurrency use.
*
*
* Before making high-concurrency calls to the TTS service,
* configure connection pool parameters using the following environment variables:
*
* DASHSCOPE_MAXIMUM_ASYNC_REQUESTS
* DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST
* DASHSCOPE_CONNECTION_POOL_SIZE
*
*/
class SpeechSynthesizerObjectFactory
extends BasePooledObjectFactory<SpeechSynthesizer> {
public SpeechSynthesizerObjectFactory() {
super();
}
@Override
public SpeechSynthesizer create() throws Exception {
return new SpeechSynthesizer();
}
@Override
public PooledObject<SpeechSynthesizer> wrap(SpeechSynthesizer obj) {
return new DefaultPooledObject<>(obj);
}
}
class CosyvoiceObjectPool {
public static GenericObjectPool<SpeechSynthesizer> synthesizerPool;
public static String COSYVOICE_OBJECTPOOL_SIZE_ENV = "COSYVOICE_OBJECTPOOL_SIZE";
public static int DEFAULT_OBJECT_POOL_SIZE = 500;
private static Lock lock = new java.util.concurrent.locks.ReentrantLock();
public static int getObjectivePoolSize() {
try {
Integer n = Integer.parseInt(System.getenv(COSYVOICE_OBJECTPOOL_SIZE_ENV));
System.out.println("Using Object Pool Size In Env: "+ n);
return n;
} catch (NumberFormatException e) {
System.out.println("Using Default Object Pool Size: "+ DEFAULT_OBJECT_POOL_SIZE);
return DEFAULT_OBJECT_POOL_SIZE;
}
}
public static GenericObjectPool<SpeechSynthesizer> getInstance() {
lock.lock();
if (synthesizerPool == null) {
// You can set the object pool size here. Or set it via the COSYVOICE_OBJECTPOOL_SIZE environment variable.
// Recommended value: 1.5× to 2× your server's maximum concurrent connections.
int objectPoolSize = getObjectivePoolSize();
SpeechSynthesizerObjectFactory speechSynthesizerObjectFactory =
new SpeechSynthesizerObjectFactory();
GenericObjectPoolConfig<SpeechSynthesizer> config =
new GenericObjectPoolConfig<>();
config.setMaxTotal(objectPoolSize);
config.setMaxIdle(objectPoolSize);
config.setMinIdle(objectPoolSize);
synthesizerPool =
new GenericObjectPool<>(speechSynthesizerObjectFactory, config);
}
lock.unlock();
return synthesizerPool;
}
}
class SynthesizeTaskWithCallback implements Runnable {
String[] textArray;
String requestId;
long timeCost;
public SynthesizeTaskWithCallback(String[] textArray) {
this.textArray = textArray;
}
@Override
public void run() {
SpeechSynthesizer synthesizer = null;
long startTime = System.currentTimeMillis();
// if recv onError
final boolean[] hasError = {false};
try {
class ReactCallback extends ResultCallback<SpeechSynthesisResult> {
ReactCallback() {}
@Override
public void onEvent(SpeechSynthesisResult message) {
if (message.getAudioFrame() != null) {
try {
byte[] bytesArray = message.getAudioFrame().array();
System.out.println("Received audio. Audio stream length: " + bytesArray.length);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
@Override
public void onComplete() {}
@Override
public void onError(Exception e) {
System.out.println(e.getMessage());
e.printStackTrace();
hasError[0] = true;
}
}
// Replace your-dashscope-api-key with your own API key
String dashScopeApiKey = "your-dashscope-api-key";
SpeechSynthesisParam param =
SpeechSynthesisParam.builder()
.model("cosyvoice-v3-flash")
.voice("longanyang")
// API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you do not set the environment variable, replace the line below with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.format(SpeechSynthesisAudioFormat
.MP3_22050HZ_MONO_256KBPS) // Use PCM or MP3 for streaming synthesis
.apiKey(dashScopeApiKey)
.build();
try {
synthesizer = CosyvoiceObjectPool.getInstance().borrowObject();
synthesizer.updateParamAndCallback(param, new ReactCallback());
for (String text : textArray) {
synthesizer.streamingCall(text);
}
Thread.sleep(20);
synthesizer.streamingComplete(60000);
requestId = synthesizer.getLastRequestId();
} catch (Exception e) {
System.out.println("Exception e: " + e.toString());
hasError[0] = true;
}
} catch (Exception e) {
hasError[0] = true;
throw new RuntimeException(e);
}
if (synthesizer != null) {
try {
if (hasError[0] == true) {
// If an error occurs, close the connection and invalidate the object in the pool.
synthesizer.getDuplexApi().close(1000, "bye");
CosyvoiceObjectPool.getInstance().invalidateObject(synthesizer);
} else {
// If the task completes normally, return the object.
CosyvoiceObjectPool.getInstance().returnObject(synthesizer);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
long endTime = System.currentTimeMillis();
timeCost = endTime - startTime;
System.out.println("[Thread " + Thread.currentThread() + "] Speech synthesis task completed. Time taken: " + timeCost + " ms, RequestId: " + requestId);
}
}
}
@Slf4j
public class SynthesizeTextToSpeechWithCallbackConcurrently {
public static void checkoutEnv(String envName, int defaultSize) {
if (System.getenv(envName) != null) {
System.out.println("[ENV CHECK]: " + envName + " "
+ System.getenv(envName));
} else {
System.out.println("[ENV CHECK]: " + envName
+ " Using Default which is " + defaultSize);
}
}
public static void main(String[] args)
throws InterruptedException, NoApiKeyException {
// The following URL is for the Singapore region. If you use models in the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
// Check for connection pool env
checkoutEnv("DASHSCOPE_CONNECTION_POOL_SIZE", 32);
checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS", 32);
checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST", 32);
checkoutEnv(CosyvoiceObjectPool.COSYVOICE_OBJECTPOOL_SIZE_ENV, CosyvoiceObjectPool.DEFAULT_OBJECT_POOL_SIZE);
int runTimes = 3;
// Create the pool of SpeechSynthesis objects
ExecutorService executorService = Executors.newFixedThreadPool(runTimes);
for (int i = 0; i < runTimes; i++) {
// Record the task submission time
LocalDateTime submissionTime = LocalDateTime.now();
executorService.submit(new SynthesizeTaskWithCallback(new String[] {
"Before my bed, moonlight gleams,", "It seems like frost upon the ground.", "I lift my gaze to watch the bright moon,", "Then bow my head, thinking of home."}));
}
// Shut down the ExecutorService and wait for all tasks to complete
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}
Recommended ECS configurations
The following configurations are based on tests where only CosyVoice runs on ECS instances of these specs. Exceeding limits may cause delays.
Single-machine concurrency refers to the number of CosyVoice tasks running simultaneously (equivalent to worker threads).
Audio format affects bandwidth. At 200 concurrent connections, PCM requires significantly more bandwidth than MP3. Use compressed formats like MP3 to reduce network overhead.
|
Machine spec (Alibaba Cloud ECS) |
Max single-machine concurrency |
Object pool size |
Connection pool size |
|
4 vCPUs, 8 GiB memory |
100 |
500 |
2000 |
|
8 vCPUs, 16 GiB memory |
150 |
500 |
2000 |
|
16 vCPUs, 32 GiB memory |
200 |
500 |
2000 |
Resource management and error handling
-
Successful task: Call the
returnObjectmethod ofGenericObjectPoolto return theSpeechSynthesizerobject to the pool for reuse. In the sample code, this corresponds toCosyvoiceObjectPool.getInstance().returnObject(synthesizer).ImportantDo not return
SpeechSynthesizerobjects from incomplete or failed tasks. -
Failed task: If an exception from the SDK or your business logic interrupts a task, perform both of the following actions:
-
Manually close the underlying WebSocket connection.
-
Invalidate the object in the object pool to prevent reuse.
// In the current code, this corresponds to: // Close the connection synthesizer.getDuplexApi().close(1000, "bye"); // Invalidate the faulty synthesizer in the pool CosyvoiceObjectPool.getInstance().invalidateObject(synthesizer); -
-
No extra action is needed when the service returns a TaskFailed error.
Pre-warming and timing metrics
Pre-warm the system before testing concurrent calls. Pre-warming ensures metrics reflect stable-state performance, excluding one-time connection setup costs.
Connection reuse mechanism
The DashScope Java SDK uses a global singleton connection pool to manage and reuse WebSocket connections, reducing frequent connection establishment and teardown and improving throughput in high-concurrency scenarios.
Key behaviors:
-
On-demand creation: The SDK creates connections on-demand (on first call), not at startup.
-
Time-limited reuse: After a request completes, the connection remains in the pool for up to 60 seconds for reuse.
-
Within 60 seconds, new requests reuse the existing connection, avoiding handshake overhead.
-
After 60 seconds idle, connections close automatically to free resources.
-
Why pre-warming matters
The connection pool contains no reusable active connections in the following cases:
-
The application just started with no calls made yet.
-
The service was idle for 60+ seconds, and all connections timed out.
In these cases, the first requests trigger full WebSocket connection setup, including TCP/TLS/WebSocket handshakes. Their end-to-end latency is much higher than later requests that reuse connections. This extra time comes from connection initialization, not service processing. Without pre-warming, test results include initial setup time and don't reflect real performance.
Best practice
To collect reliable performance data, follow these pre-warming steps before formal stress testing or latency measurement:
-
Simulate your test's concurrency level. Send warm-up calls (1-2 min) to populate the connection pool.
-
Confirm the pool has sufficient active connections before starting data collection.
Proper pre-warming brings the SDK connection pool to a stable reuse state, yielding latency metrics that reflect real-world production performance.