The CosyVoice speech synthesis service uses the WebSocket protocol for real-time streaming communication. In high-concurrency scenarios, creating and destroying a WebSocket connection for each request consumes significant resources and introduces connection latency. To optimize performance and ensure stability, the DashScope SDK provides resource reuse mechanisms, such as connection pools and object pools. This document describes how to use these features in the DashScope Python and Java SDKs to efficiently call the CosyVoice service in high-concurrency scenarios.
To use a model in the China (Beijing) region, go to the API key page for the China (Beijing) region
Prerequisites
You have installed a compatible version of the DashScope SDK: install the latest version
Python SDK: Version 1.25.2 or later
Java SDK: Version 2.16.6 or later
Python SDK: Object pool optimization
The Python SDK uses the SpeechSynthesizerObjectPool class to manage and reuse SpeechSynthesizer objects through object pool optimization.
When the object pool is initialized, it creates a specified number of SpeechSynthesizer instances and establishes WebSocket connections in advance. When you retrieve an object from the pool, you can send a request directly without waiting for a connection to be established. This practice effectively reduces first-packet latency. When a task is complete and the object is returned to the pool, its WebSocket connection remains active and ready for the next task.
Implementation steps
Install dependencies: Install the DashScope dependency by running
pip install -U dashscope.Create and configure the object pool.
Set the object pool size using
SpeechSynthesizerObjectPool. We recommend setting this value to 1.5 to 2 times your peak concurrency. The object pool size must not exceed the queries per second (QPS) limit for your account.Use the following code to create a global singleton object pool with a fixed size. When the object pool is initialized, it creates the specified number of
SpeechSynthesizerobjects and establishes WebSocket connections. This process takes some time.from dashscope.audio.tts_v2 import SpeechSynthesizerObjectPool synthesizer_object_pool = SpeechSynthesizerObjectPool(max_size=20)Retrieve a
SpeechSynthesizerobject from the object pool.If the number of objects currently in use exceeds the maximum capacity of the pool, the system creates a new
SpeechSynthesizerobject.This new object must be re-initialized and must establish a new WebSocket connection. It cannot use existing connection resources from the pool and therefore does not benefit from resource reuse.
speech_synthesizer = connectionPool.borrow_synthesizer( model='cosyvoice-v3-flash', voice='longanyang', seed=12382, callback=synthesizer_callback )Perform speech synthesis.
Call the `call` or `streaming_call` method of the
SpeechSynthesizerobject to perform speech synthesis.Return the
SpeechSynthesizerobject.After the speech synthesis task is complete, return the
SpeechSynthesizerobject so that it can be reused by subsequent tasks.Do not return objects from incomplete or failed tasks.
connectionPool.return_synthesizer(speech_synthesizer)
Complete code
# !/usr/bin/env python3
# Copyright (C) Alibaba Group. All Rights Reserved.
# MIT License (https://opensource.org/licenses/MIT)
import os
import time
import threading
import dashscope
from dashscope.audio.tts_v2 import *
USE_CONNECTION_POOL = True
text_to_synthesize = [
'Sentence 1: Welcome to the Alibaba Cloud speech synthesis service.',
'Sentence 2: Welcome to the Alibaba Cloud speech synthesis service.',
'Sentence 3: Welcome to the Alibaba Cloud speech synthesis service.',
]
connectionPool = None
if USE_CONNECTION_POOL:
print('creating connection pool')
start_time = time.time() * 1000
connectionPool = SpeechSynthesizerObjectPool(max_size=3)
end_time = time.time() * 1000
print('connection pool created, cost: {} ms'.format(end_time - start_time))
def init_dashscope_api_key():
'''
Set your DashScope API key. For more information, see
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
'''
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load the API key from the DASHSCOPE_API_KEY environment variable
else:
dashscope.api_key = '<your-dashscope-api-key>' # set the API key manually
def synthesis_text_to_speech_and_play_by_streaming_mode(text, task_id):
global USE_CONNECTION_POOL, connectionPool
'''
Synthesize speech with given text by streaming mode, async call and play the synthesized audio in real-time.
For more information, see https://www.alibabacloud.com/help/document_detail/2712523.html
'''
complete_event = threading.Event()
# Define a callback to handle the result
class Callback(ResultCallback):
def on_open(self):
# When using an object pool, on_open is called after the task starts.
self.file = open(f'result_{task_id}.mp3', 'wb')
print(f'[task_{task_id}] start')
def on_complete(self):
print(f'[task_{task_id}] speech synthesis task completed successfully.')
complete_event.set()
def on_error(self, message: str):
print(f'[task_{task_id}] speech synthesis task failed, {message}')
def on_close(self):
# When using an object pool, on_close is called after the task is finished.
print(f'[task_{task_id}] finished')
def on_event(self, message):
# print(f'recv speech synthsis message {message}')
pass
def on_data(self, data: bytes) -> None:
# send to player
# save audio to file
self.file.write(data)
# Call the speech synthesizer callback
synthesizer_callback = Callback()
# Initialize the speech synthesizer
# you can customize the synthesis parameters, such as voice, format, sample_rate, or other parameters
if USE_CONNECTION_POOL:
speech_synthesizer = connectionPool.borrow_synthesizer(
model='cosyvoice-v3-flash',
voice='longanyang',
seed=12382,
callback=synthesizer_callback
)
else:
speech_synthesizer = SpeechSynthesizer(model='cosyvoice-v3-flash',
voice='longanyang',
seed=12382,
callback=synthesizer_callback)
try:
speech_synthesizer.call(text)
except Exception as e:
print(f'[task_{task_id}] speech synthesis task failed, {e}')
if USE_CONNECTION_POOL:
# If the task fails when you use a connection pool, close the synthesizer connection manually.
speech_synthesizer.close()
return
print('[task_{}] Synthesized text: {}'.format(task_id, text))
complete_event.wait()
print('[task_{}][Metric] requestId: {}, first package delay ms: {}'.format(
task_id,
speech_synthesizer.get_last_request_id(),
speech_synthesizer.get_first_package_delay()))
if USE_CONNECTION_POOL:
connectionPool.return_synthesizer(speech_synthesizer)
# main function
if __name__ == '__main__':
init_dashscope_api_key()
task_thread_list = []
for task_id in range(3):
thread = threading.Thread(
target=synthesis_text_to_speech_and_play_by_streaming_mode,
args=(text_to_synthesize[task_id], task_id))
task_thread_list.append(thread)
for task_thread in task_thread_list:
task_thread.start()
for task_thread in task_thread_list:
task_thread.join()
if USE_CONNECTION_POOL:
connectionPool.shutdown()Resource management and troubleshooting
Task success: When a speech synthesis task completes successfully, you must call
connectionPool.return_synthesizer(speech_synthesizer)to return theSpeechSynthesizerobject to the pool for reuse.ImportantDo not return
SpeechSynthesizerobjects from incomplete or failed tasks.Task failure: If a task is interrupted by an internal SDK exception or a business logic exception, you must manually close the underlying WebSocket connection by calling
speech_synthesizer.close().After all speech synthesis tasks are complete, close the object pool by calling
connectionPool.shutdown().If a `TaskFailed` error occurs, no extra handling is required.
Java SDK: Connection and object pool optimization
The Java SDK uses a built-in connection pool and a custom object pool that work together to deliver optimal performance.
Connection pool: The SDK integrates an OkHttp3 connection pool to manage and reuse underlying WebSocket connections. This reduces network handshake overhead. This feature is enabled by default.
Object pool: Implemented based on
commons-pool2, this pool maintains a set ofSpeechSynthesizerobjects with pre-established connections. Retrieving an object from the pool eliminates connection setup latency and significantly reduces first-packet latency.
Implementation steps
Add dependencies.
Based on your project's build tool, add `dashscope-sdk-java` and `commons-pool2` to your dependency configuration file.
The following sections show the configurations for Maven and Gradle.
Maven
Open the
pom.xmlfile of your Maven project.Add the following dependency information within the
<dependencies>tag.
<dependency> <groupId>com.alibaba</groupId> <artifactId>dashscope-sdk-java</artifactId> <!-- Replace 'the-latest-version' with version 2.16.9 or later. You can find the relevant version numbers at: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java --> <version>the-latest-version</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-pool2</artifactId> <!-- Replace 'the-latest-version' with the latest version. You can find the relevant version numbers at: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 --> <version>the-latest-version</version> </dependency>Save the
pom.xmlfile.Use a Maven command, such as
mvn clean installormvn compile, to update the project dependencies.
Gradle
Open the
build.gradlefile of your Gradle project.Add the following dependency information within the
dependenciesblock.dependencies { // Replace 'the-latest-version' with version 2.16.6 or later. You can find the relevant version numbers at: https://mvnrepository.com/artifact/com.alibaba/dashscope-sdk-java implementation group: 'com.alibaba', name: 'dashscope-sdk-java', version: 'the-latest-version' // Replace 'the-latest-version' with the latest version. You can find the relevant version numbers at: https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 implementation group: 'org.apache.commons', name: 'commons-pool2', version: 'the-latest-version' }Save the
build.gradlefile.In the command line, navigate to your project's root directory and run the following Gradle command to update the project dependencies.
./gradlew build --refresh-dependenciesIf you use Windows, run the following command:
gradlew build --refresh-dependencies
Configure the connection pool.
You can configure key connection pool parameters using environment variables:
Environment variable
Description
DASHSCOPE_CONNECTION_POOL_SIZE
The size of the connection pool.
Recommended value: At least twice your peak concurrency.
Default value: 32.
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS
The maximum number of asynchronous requests.
Recommended value: Same as
DASHSCOPE_CONNECTION_POOL_SIZE.Default value: 32.
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST
The maximum number of asynchronous requests per host.
Recommended value: Same as
DASHSCOPE_CONNECTION_POOL_SIZE.Default value: 32.
Configure the object pool.
You can configure the object pool size using an environment variable:
Environment variable
Description
COSYVOICE_OBJECTPOOL_SIZE
The object pool size.
Recommended value: 1.5 to 2 times your peak concurrency.
Default value: 500.
ImportantThe object pool size (
COSYVOICE_OBJECTPOOL_SIZE) must be less than or equal to the connection pool size (DASHSCOPE_CONNECTION_POOL_SIZE). Otherwise, if the connection pool is full when the object pool requests an object, the calling thread is blocked until a connection becomes available.The object pool size must not exceed the QPS limit for your account.
Use the following code to create the object pool:
class CosyvoiceObjectPool { // ... Other code is omitted here. For the complete example, see the full code. public static GenericObjectPool<SpeechSynthesizer> getInstance() { lock.lock(); if (synthesizerPool == null) { // You can set the object pool size here or in the COSYVOICE_OBJECTPOOL_SIZE environment variable. // Set it to 1.5 to 2 times the maximum concurrent connections of the server. int objectPoolSize = getObjectivePoolSize(); SpeechSynthesizerObjectFactory speechSynthesizerObjectFactory = new SpeechSynthesizerObjectFactory(); GenericObjectPoolConfig<SpeechSynthesizer> config = new GenericObjectPoolConfig<>(); config.setMaxTotal(objectPoolSize); config.setMaxIdle(objectPoolSize); config.setMinIdle(objectPoolSize); synthesizerPool = new GenericObjectPool<>(speechSynthesizerObjectFactory, config); } lock.unlock(); return synthesizerPool; } }Retrieve a
SpeechSynthesizerobject from the object pool.If the number of objects currently in use exceeds the maximum capacity of the pool, the system creates a new
SpeechSynthesizerobject.This new object must be re-initialized and must establish a new WebSocket connection. It cannot use existing connection resources from the pool and therefore does not benefit from resource reuse.
synthesizer = CosyvoiceObjectPool.getInstance().borrowObject();Perform speech synthesis.
Call the `call` or `streamingCall` method of the
SpeechSynthesizerobject to perform speech synthesis.Return the
SpeechSynthesizerobject.After the speech synthesis task is complete, return the
SpeechSynthesizerobject so that it can be reused by subsequent tasks.Do not return objects from incomplete or failed tasks.
CosyvoiceObjectPool.getInstance().returnObject(synthesizer);
Complete code
package org.alibaba.bailian.example.examples;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.commons.pool2.impl.GenericObjectPool;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
import java.time.LocalDateTime;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Lock;
/**
* You need to import the org.apache.commons.pool2 and DashScope packages into your project.
*
* DashScope SDK versions 2.16.6 and later are optimized for high-concurrency scenarios.
* Versions earlier than DashScope SDK 2.16.6 are not recommended for use in high-concurrency scenarios.
*
*
* Before making high-concurrency calls to the text-to-speech (TTS) service,
* configure the connection pool parameters using the following environment variables.
*
* DASHSCOPE_MAXIMUM_ASYNC_REQUESTS
* DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST
* DASHSCOPE_CONNECTION_POOL_SIZE
*
*/
class SpeechSynthesizerObjectFactory
extends BasePooledObjectFactory<SpeechSynthesizer> {
public SpeechSynthesizerObjectFactory() {
super();
}
@Override
public SpeechSynthesizer create() throws Exception {
return new SpeechSynthesizer();
}
@Override
public PooledObject<SpeechSynthesizer> wrap(SpeechSynthesizer obj) {
return new DefaultPooledObject<>(obj);
}
}
class CosyvoiceObjectPool {
public static GenericObjectPool<SpeechSynthesizer> synthesizerPool;
public static String COSYVOICE_OBJECTPOOL_SIZE_ENV = "COSYVOICE_OBJECTPOOL_SIZE";
public static int DEFAULT_OBJECT_POOL_SIZE = 500;
private static Lock lock = new java.util.concurrent.locks.ReentrantLock();
public static int getObjectivePoolSize() {
try {
Integer n = Integer.parseInt(System.getenv(COSYVOICE_OBJECTPOOL_SIZE_ENV));
System.out.println("Using Object Pool Size In Env: "+ n);
return n;
} catch (NumberFormatException e) {
System.out.println("Using Default Object Pool Size: "+ DEFAULT_OBJECT_POOL_SIZE);
return DEFAULT_OBJECT_POOL_SIZE;
}
}
public static GenericObjectPool<SpeechSynthesizer> getInstance() {
lock.lock();
if (synthesizerPool == null) {
// You can set the object pool size here or in the COSYVOICE_OBJECTPOOL_SIZE environment variable.
// Set it to 1.5 to 2 times the maximum concurrent connections of the server.
int objectPoolSize = getObjectivePoolSize();
SpeechSynthesizerObjectFactory speechSynthesizerObjectFactory =
new SpeechSynthesizerObjectFactory();
GenericObjectPoolConfig<SpeechSynthesizer> config =
new GenericObjectPoolConfig<>();
config.setMaxTotal(objectPoolSize);
config.setMaxIdle(objectPoolSize);
config.setMinIdle(objectPoolSize);
synthesizerPool =
new GenericObjectPool<>(speechSynthesizerObjectFactory, config);
}
lock.unlock();
return synthesizerPool;
}
}
class SynthesizeTaskWithCallback implements Runnable {
String[] textArray;
String requestId;
long timeCost;
public SynthesizeTaskWithCallback(String[] textArray) {
this.textArray = textArray;
}
@Override
public void run() {
SpeechSynthesizer synthesizer = null;
long startTime = System.currentTimeMillis();
// if onError is received
final boolean[] hasError = {false};
try {
class ReactCallback extends ResultCallback<SpeechSynthesisResult> {
ReactCallback() {}
@Override
public void onEvent(SpeechSynthesisResult message) {
if (message.getAudioFrame() != null) {
try {
byte[] bytesArray = message.getAudioFrame().array();
System.out.println("Received audio. Audio file stream length: " + bytesArray.length);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
@Override
public void onComplete() {}
@Override
public void onError(Exception e) {
System.out.println(e.getMessage());
e.printStackTrace();
hasError[0] = true;
}
}
// Replace your-dashscope-api-key with your API key.
String dashScopeApiKey = "your-dashscope-api-key";
SpeechSynthesisParam param =
SpeechSynthesisParam.builder()
.model("cosyvoice-v1")
.voice("longxiaochun")
.format(SpeechSynthesisAudioFormat
.MP3_22050HZ_MONO_256KBPS) // Use PCM or MP3 for streaming synthesis.
.apiKey(dashScopeApiKey)
.build();
try {
synthesizer = CosyvoiceObjectPool.getInstance().borrowObject();
synthesizer.updateParamAndCallback(param, new ReactCallback());
for (String text : textArray) {
synthesizer.streamingCall(text);
}
Thread.sleep(20);
synthesizer.streamingComplete(60000);
requestId = synthesizer.getLastRequestId();
} catch (Exception e) {
System.out.println("Exception e: " + e.toString());
hasError[0] = true;
}
} catch (Exception e) {
hasError[0] = true;
throw new RuntimeException(e);
}
if (synthesizer != null) {
try {
if (hasError[0] == true) {
// If an exception occurs, close the connection and invalidate the object in the pool.
synthesizer.getDuplexApi().close(1000, "bye");
CosyvoiceObjectPool.getInstance().invalidateObject(synthesizer);
} else {
// If the task finishes normally, return the object.
CosyvoiceObjectPool.getInstance().returnObject(synthesizer);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
long endTime = System.currentTimeMillis();
timeCost = endTime - startTime;
System.out.println("[Thread " + Thread.currentThread() + "] Speech synthesis task finished. Time cost: " + timeCost + " ms, RequestId: " + requestId);
}
}
}
@Slf4j
public class SynthesizeTextToSpeechWithCallbackConcurrently {
public static void checkoutEnv(String envName, int defaultSize) {
if (System.getenv(envName) != null) {
System.out.println("[ENV CHECK]: " + envName + " "
+ System.getenv(envName));
} else {
System.out.println("[ENV CHECK]: " + envName
+ " Using Default which is " + defaultSize);
}
}
public static void main(String[] args)
throws InterruptedException, NoApiKeyException {
// Check for connection pool environment variables
checkoutEnv("DASHSCOPE_CONNECTION_POOL_SIZE", 32);
checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS", 32);
checkoutEnv("DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST", 32);
checkoutEnv(CosyvoiceObjectPool.COSYVOICE_OBJECTPOOL_SIZE_ENV, CosyvoiceObjectPool.DEFAULT_OBJECT_POOL_SIZE);
int runTimes = 3;
// Create the pool of SpeechSynthesis objects
ExecutorService executorService = Executors.newFixedThreadPool(runTimes);
for (int i = 0; i < runTimes; i++) {
// Record the task submission time
LocalDateTime submissionTime = LocalDateTime.now();
executorService.submit(new SynthesizeTaskWithCallback(new String[] {
"Moonlight before my bed,", "Is it frost upon the ground?", "I lift my head to see the moon,", "I drop my head and think of home."}));
}
// Shut down the ExecutorService and wait for all tasks to complete
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}Recommended configurations
The following configurations are based on test results from running the CosyVoice speech synthesis service on Alibaba Cloud servers with the specified configurations. High concurrency may cause task processing delays.
Concurrency per server is the number of CosyVoice speech synthesis tasks that can run simultaneously. This value is equivalent to the number of worker threads.
Server configuration (Alibaba Cloud) | Maximum concurrency per server | Object pool size | Connection pool size |
4 cores 8 GiB | 100 | 500 | 2000 |
8 cores 16 GiB | 150 | 500 | 2000 |
16 cores 32 GiB | 200 | 500 | 2000 |
Resource management and troubleshooting
Task success: When a speech synthesis task completes successfully, you must call the `returnObject` method of `GenericObjectPool` to return the
SpeechSynthesizerobject to the pool for reuse.In the provided code example, this is done by calling
CosyvoiceObjectPool.getInstance().returnObject(synthesizer).ImportantDo not return
SpeechSynthesizerobjects from incomplete or failed tasks.Task failure: If a task is interrupted by an internal SDK exception or a business logic exception, you must perform the following two operations:
Manually close the underlying WebSocket connection.
Invalidate the object in the object pool to prevent it from being used again.
// This corresponds to the following content in the current code // Close the connection synthesizer.getDuplexApi().close(1000, "bye"); // Invalidate the problematic synthesizer in the object pool CosyvoiceObjectPool.getInstance().invalidateObject(synthesizer);If a `TaskFailed` error occurs, no extra handling is required.