All Products
Search
Document Center

Platform For AI:CosyVoice 2.0 API reference

Last Updated:Dec 16, 2025

CosyVoice 2 provides API operations that you can use to manage audio files and synthesize speech. This topic describes these API operations and explains how to call them.

Preparations

  1. Deploy the CosyVoice 2 WebUI service or the high-performance service with separate frontend and backend components. You must also mount Object Storage Service (OSS) or another storage service to store uploaded audio files. For more information, see Quickly deploy a WebUI service or Quickly deploy a high-performance service with separate frontend and backend components.

  2. Obtain the service endpoint and token.

    Important
    • For the high-performance service with separate frontend and backend components, send API calls to the Frontend service.

    • For stress testing, use the VPC endpoint. This provides significantly higher processing speeds than the public endpoint.

    1. Click the name of the CosyVoice 2 WebUI service or Frontend service. On the Overview page, in the Basic Information section, click View Invocation Information.

    2. In the Calling Information panel, on the Shared Gateway tab, obtain the service endpoint (EAS_SERVICE_URL) and token (EAS_TOKEN), and then remove the trailing / from the endpoint.

      Note
      • To use the public endpoint, the client must be able to access the public network.

      • To use the VPC endpoint, the client must be in the same virtual private cloud (VPC) as the service.

      image

  3. Prepare an audio file.

    The following sample audio file is used in this topic:

    • Sample WAV audio file: zero_shot_prompt.wav

    • Sample audio text: I hope you can do even better than me in the future.

API operations

Upload a reference audio file

  • Calling method

    Endpoint

    <EAS_SERVICE_URL>/api/v1/audio/reference_audio

    Request method

    POST

    Request headers

    Authorization: Bearer <EAS_TOKEN>

    Request parameters

    • file: Required. The audio file to upload. MP3, WAV, and PCM formats are supported. Type: file. Default value: None.

    • text: Required. The text content corresponding to the audio file. Type: string.

    Response parameters

    Returns a reference audio object. For more information, see Response parameter list.

    Response parameters

    Parameter

    Type

    Description

    id

    string

    The ID of the audio file.

    filename

    string

    The name of the audio file.

    bytes

    integer

    The file size.

    created_at

    integer

    The UNIX timestamp when the file was created.

    text

    string

    The text content corresponding to the audio file.

  • Request examples

    cURL

    # Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token. 
    
    curl -XPOST <EAS_SERVICE_URL>/api/v1/audio/reference_audio \
        --header 'Authorization: Bearer <EAS_TOKEN>' \
        --form 'file=@"/home/xxxx/zero_shot_prompt.wav"' \
        --form 'text="I hope you can do even better than me in the future."'

    Python

    import requests
    
    response = requests.post(
        "<EAS_SERVICE_URL>/api/v1/audio/reference_audio",  # Replace <EAS_SERVICE_URL> with the service endpoint.
        headers={
            "Authorization": "Bearer <EAS_TOKEN>",  # Replace <EAS_TOKEN> with the service token.
        },
        files={
            "file": open("./zero_shot_prompt.wav", "rb"),
        },
        data={
            "text": "I hope you can do even better than me in the future."
        }
    )
    
    print(response.text)
    
  • Response example

    {
        "id": "50a5fdb9-c3ad-445a-adbb-3be32750****",
        "filename": "zero_shot_prompt.wav",
        "bytes": 111496,
        "created_at": 1748416005,
        "text": "I hope you can do even better than me in the future."
    }

View the list of reference audio files

  • Calling method

    Endpoint

    <EAS_SERVICE_URL>/api/v1/audio/reference_audio

    Request method

    GET

    Request headers

    Authorization: Bearer <EAS_TOKEN>

    Request parameters

    • limit: Optional. The maximum number of files to return. Type: integer. Default value: 100.

    • order: Optional. The sorting order of objects based on the created_at timestamp. Type: string. Valid values:

      • asc: ascending

      • desc (default): descending

    Response parameters

    Returns an array of reference audio objects. For more information, see Response parameter list.

  • Request examples

    cURL

    # Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token.
    
    curl -XGET <EAS_SERVICE_URL>/api/v1/audio/reference_audio?limit=10&order=desc \
          --header 'Authorization: Bearer <EAS_TOKEN>'

    Python

    import requests
    
    response = requests.get(
        "<EAS_SERVICE_URL>/api/v1/audio/reference_audio",  # Replace <EAS_SERVICE_URL> with the service endpoint. 
        headers={
            "Authorization": "Bearer <EAS_TOKEN>",  # Replace <EAS_TOKEN> with the service token.
        }
    )
    
    print(response.text)
  • Response example

    [
        {
            "id": "50a5fdb9-c3ad-445a-adbb-3be32750****",
            "filename": "zero_shot_prompt.wav",
            "bytes": 111496,
            "created_at": 1748416005,
            "text": "I hope you can do even better than me in the future."
        }
    ]

View a specific reference audio file

  • Calling method

    Endpoint

    <EAS_SERVICE_URL>/api/v1/audio/reference_audio/<reference_audio_id>

    Request method

    GET

    Request headers

    Authorization: Bearer <EAS_TOKEN>

    Path parameters

    reference_audio_id: Required. The ID of the reference audio file. To obtain the ID, see View the list of reference audio files. Type: String. Default value: None.

    Response parameters

    Returns a reference audio object. For more information, see Response parameter list.

  • Request examples

    cURL

    # Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token.
    # Replace <reference_audio_id> with the reference audio ID.
    curl -XGET <EAS_SERVICE_URL>/api/v1/audio/reference_audio/<reference_audio_id> \
          --header 'Authorization: Bearer <EAS_TOKEN>'

    Python

    import requests
    
    response = requests.get(
        "<EAS_SERVICE_URL>/api/v1/audio/reference_audio/<reference_audio_id>",  # Replace <EAS_SERVICE_URL> with the service endpoint.
        headers={
            "Authorization": "Bearer <EAS_TOKEN>",  # Replace <EAS_TOKEN> with the service token.
        }
    )
    
    print(response.text)
    
  • Response example

    {
        "id": "50a5fdb9-c3ad-445a-adbb-3be32750****",
        "filename": "zero_shot_prompt.wav",
        "bytes": 111496,
        "created_at": 1748416005,
        "text": "I hope you can do even better than me in the future."
    }

Delete a reference audio file

  • Calling method

    Endpoint

    <EAS_SERVICE_URL>/api/v1/audio/reference_audio/<reference_audio_id>

    Request method

    DELETE

    Request headers

    Authorization: Bearer <EAS_TOKEN>

    Path parameters

    reference_audio_id: Required. The ID of the reference audio file. To obtain the ID, see View the list of reference audio files. Type: String. Default value: None.

    Response parameters

    Returns a reference to an audio object.

  • Request examples

    cURL

    # Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token.
    # Replace <reference_audio_id> with the reference audio ID. 
    
    curl -XDELETE <EAS_SERVICE_URL>/api/v1/audio/reference_audio/<reference_audio_id> \
          --header 'Authorization: Bearer <EAS_TOKEN>'

    Python

    import requests
    
    response = requests.delete(
        "<EAS_SERVICE_URL>/api/v1/audio/reference_audio/<reference_audio_id>",  # Replace <EAS_SERVICE_URL> with the service endpoint.
        headers={
            "Authorization": "Bearer <EAS_TOKEN>",  # Replace <EAS_TOKEN> with the service token.
        }
    )
    
    print(response.text)
    
  • Response example

    {
        "code": "OK",
        "message": "reference audio: c0939ce0-308e-4073-918f-91ac88e3**** deleted.",
        "data": {}
    }

Create speech synthesis

  • Calling method

    Endpoint

    <EAS_SERVICE_URL>/api/v1/audio/speech

    Request method

    POST

    Request headers

    • Authorization: Bearer <EAS_TOKEN>

    • Content-Type: application/json

    Request parameters

    • model: Required. The model name. Only CosyVoice2-0.5B is supported. Type: string. Default value: None.

    • input: Required. The input content. Type: object. Default value: None. The object contains the following parameters:

      • mode: Required. The speech synthesis mode. Type: string. Valid values:

        • fast_replication: fast replication

        • cross_lingual_replication: cross-lingual replication

        • natural_language_replication: natural language replication

      • text: Required. The text to synthesize. Type: string. Default value: None.

      • reference_audio_id: Required. The ID of the reference audio file. To obtain the ID, see View the list of reference audio files. Type: string. Default value: None.

      • instruct: Optional. The instruction text to dynamically adjust the voice style, such as tone, emotion, and speed. This parameter is effective only when mode is set to natural_language_replication. Type: string. Default value: None.

      • sample_rate: Optional. The audio sampling rate. Default value: 24000.

      • bit_rate: Optional. The bit rate. Type: string. Default value: 192k. Supported values: 16k, 32k, 48k, 64k, 128k, 192k, 256k, 320k, and 384k.

      • volume: Optional. The volume. Type: float. Default value: 1.0. For example, 3.0 means triple the volume, and 0.8 means 0.8 times the volume.

      • speed: Optional. The speed of the output speech. The value ranges from 0.5 to 2.0. Type: float. Default value: 1.0.

      • output_format: Optional. The format of the output audio. Supported formats: wav, mp3, and pcm. Default value: wav.

    • stream: Optional. Specifies whether to enable streaming output. Type: boolean. Default value: true.

    Response parameters

    Returns a speech chunk object in a stream. For more information, see Response parameter list.

    Response parameters

    Parameter

    Type

    Description

    request_id

    string

    The request ID.

    output

    string

    The output content.

    audio

    object

    The audio content.

    audio.id

    string

    The audio ID.

    audio.data

    string

    The WAV byte stream converted into Base64-encoded data.

    finish_reason

    string

    • Non-streaming output: If the request is successful, the value is null. If the request fails, the reason for the failure is returned.

    • Streaming output: The value is null while the audio is being generated. The value is "stop" when the generation ends naturally or is terminated by the stop condition in the input parameters.

    usage

    integer

    The file size.

  • Request examples

    Non-streaming call

    cURL

    • Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token.

    • Replace <reference_audio_id> with the reference audio ID.

    # Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token.
    # Replace <reference_audio_id> with the reference audio ID. 
    
    curl -XPOST <EAS_SERVICE_URL>/api/v1/audio/speech \
    --header 'Authorization: Bearer <EAS_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data '{
        "model": "CosyVoice2-0.5B",
        "input": {
            "mode": "natural_language_replication",
            "reference_audio_id": "<reference_audio_id>",
            "text": "Receiving a birthday gift from a friend far away, the unexpected surprise and deep blessings filled my heart with sweet joy, and my smile bloomed like a flower.",
            "speed": 1.0,
            "output_format": "mp3",
            "sample_rate": 32000,
            "bit_rate": "48k",
            "volume": 2.0,
            "instruct": "Speak in Sichuan dialect"
        },
        "stream": false
    }'

    The Base64-encoded result is returned:

    {"output":{"finish_reason":null,"audio":{"data":"DNgB9djax9su3Ba...."}},"request_id": "f90a65be-f47b-46b5-9ddc-70bae550****"}

    Python

    Install the following dependencies:

    pip install requests==2.32.3 packaging==24.2
    import json
    import base64
    import requests
    from packaging import version
    
    required_version = "2.32.3"
    
    if version.parse(requests.__version__) < version.parse(required_version):
        raise RuntimeError(f"requests version must >= {required_version}")
    
    with requests.post(
        "<EAS_SERVICE_URL>/api/v1/audio/speech",    # Replace <EAS_SERVICE_URL> with the service endpoint.
                                                    # Example: "http://cosyvoice-frontend-test.1534081855183999.cn-hangzhou.pai-eas.aliyuncs.com/api/v1/audio/speech"
        headers={
            "Authorization": "Bearer <EAS_TOKEN>",  # Replace <EAS_TOKEN> with the service token.
            "Content-Type": "application/json",
        },
        json={
            "model": "CosyVoice2-0.5B",
            "input": {
                "mode": "natural_language_replication",
                "reference_audio_id": "<reference_audio_id>",  # Replace <reference_audio_id> with the reference audio ID.
                "text": "Receiving a birthday gift from a friend far away, the unexpected surprise and deep blessings filled my heart with sweet joy, and my smile bloomed like a flower.",
                "output_format": "mp3",
                "sample_rate": 24000,
                "speed": 1.0,
                "bit_rate": "48k",
                "volume": 2.0,
                "instruct": "Speak in Sichuan dialect"
            },
            "stream": False
        },
        timeout=10
    ) as response:
        if response.status_code != 200:
            print(response.text)
            exit()
    
    
        data = json.loads(response.content)
        encode_buffer = data['output']['audio']['data']
        decode_buffer = base64.b64decode(encode_buffer)
    
        with open('./http_non_stream.mp3', 'wb') as f:
            f.write(decode_buffer)
    

    Streaming call

    cURL

    • Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token.

    • Replace <reference_audio_id> with the reference audio ID.

    # Replace <EAS_SERVICE_URL> and <EAS_TOKEN> with the service endpoint and token.
    # Replace <reference_audio_id> with the reference audio ID. 
    
    curl -XPOST <EAS_SERVICE_URL>/api/v1/audio/speech \
    --header 'Authorization: Bearer <EAS_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data '{
        "model": "CosyVoice2-0.5B",
        "input": {
            "mode": "natural_language_replication",
            "reference_audio_id": "<reference_audio_id>",
            "text": "Receiving a birthday gift from a friend far away, the unexpected surprise and deep blessings filled my heart with sweet joy, and my smile bloomed like a flower.",
            "speed": 1.0,
            "output_format": "mp3",
            "sample_rate": 32000,
            "bit_rate": "48k",
            "volume": 2.0,
            "instruct": "Speak in Sichuan dialect"
        },
        "stream": true
    }'

    The Base64-encoded result is returned:

    data: {"output":{"finish_reason":null,"audio":{"data":"DNgB9djax9su3Ba...."}},"request_id": "f90a65be-f47b-46b5-9ddc-70bae550****"}
    data: {"output":{"finish_reason":null,"audio":{"data":"DNgB9djax9su3Ba...."}},"request_id": "f90a65be-f47b-46b5-9ddc-70bae550****"}
    data: {"output":{"finish_reason":null,"audio":{"data":"DNgB9djax9su3Ba...."}},"request_id": "f90a65be-f47b-46b5-9ddc-70bae550****"}
    data: {"output":{"finish_reason":null,"audio":{"data":"DNgB9djax9su3Ba...."}},"request_id": "f90a65be-f47b-46b5-9ddc-70bae550****"}

    Python

    Install the Python SSE client:

    pip install requests==2.32.3 packaging==24.2 sseclient-py==1.8.0 -i http://mirrors.cloud.aliyuncs.com/pypi/simple --trusted-host mirrors.cloud.aliyuncs.com
    import io
    import json
    import base64
    import wave
    import requests
    from sseclient import SSEClient # pip install sseclient-py
    from packaging import version
    
    required_version = "2.32.3"
    
    if version.parse(requests.__version__) < version.parse(required_version):
        raise RuntimeError(f"requests version must >= {required_version}")
    
    
    with requests.post(
        "<EAS_SERVICE_URL>/api/v1/audio/speech",    # Replace <EAS_SERVICE_URL> with the service endpoint.
                                                    # Example: "http://cosyvoice-frontend-test.1534081855183999.cn-hangzhou.pai-eas.aliyuncs.com/api/v1/audio/speech"
        headers={
            "Authorization": "Bearer <EAS_TOKEN>",  # Replace <EAS_TOKEN> with the service token.
            "Content-Type": "application/json",
        },
        json={
            "model": "CosyVoice2-0.5B",
            "input": {
                "mode": "natural_language_replication",
                "reference_audio_id": "<reference_audio_id>",  # Replace <reference_audio_id> with the reference audio ID. 
                "text": "Receiving a birthday gift from a friend far away, the unexpected surprise and deep blessings filled my heart with sweet joy, and my smile bloomed like a flower.",
                "output_format": "mp3",
                "sample_rate": 24000,
                "speed": 1.0,
                "bit_rate": "48k",
                "volume": 2.0,
                "instruct": "Speak in Sichuan dialect"
            },
    
        },
        timeout=10
    ) as response:
        if response.status_code != 200:
            print(response.text)
            exit()
    
        messages = SSEClient(response)
        with open('./http_stream.mp3', 'wb') as f:
            for i, msg in enumerate(messages.events()):
                print(f"Event: {msg.event}, Data: {msg.data}")
                data = json.loads(msg.data)
                if data['error'] is not None:
                    print(data['error'])
                    break
                encode_buffer = data['output']['audio']['data']
                decode_buffer = base64.b64decode(encode_buffer)
                f.write(decode_buffer)
    

    Websocket API

    Install the following dependency:

    pip install websocket-client==1.8.0 -i http://mirrors.cloud.aliyuncs.com/pypi/simple --trusted-host mirrors.cloud.aliyuncs.com
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    import base64
    import json
    import logging
    import sys
    import time
    import uuid
    import traceback
    import websocket
    
    
    class TTSClient:
        def __init__(self, api_key, uri, params, log_level='INFO'):
            """
        Initializes a TTSClient instance.
    
        Parameters:
            api_key (str): The API key for authentication.
            uri (str): The WebSocket service endpoint.
        """
            self._api_key = api_key  # Replace with your API key.
            self._uri = uri  # Replace with your WebSocket endpoint.
            self._task_id = str(uuid.uuid4())  # Generate a unique task ID.
            self._ws = None  # WebSocketApp instance
            self._task_started = False  # Specifies whether a task-started message is received.
            self._task_finished = False  # Specifies whether a task-finished or task-failed message is received.
            self._check_params(params)
            self._params = params
            self._chunk_metrics = []
            self._metrics = {}
            self._first_package_time = None
            self._last_time = None
            self._init_log(log_level)
            self.audio_data = b''
    
        def _init_log(self, log_level):
            self._log = logging.getLogger("ws_client")
            log_formatter = logging.Formatter('%(asctime)s - Process(%(process)s) - %(levelname)s - %(message)s')
            stream_handler = logging.StreamHandler(stream=sys.stdout)
            stream_handler.setFormatter(log_formatter)
            self._log.addHandler(stream_handler)
            self._log.setLevel(log_level)
    
        def get_metrics(self):
            """Gets the performance metrics of the synthesis result."""
            return self._metrics
    
        def _check_params(self, params):
            assert 'mode' in params and params['mode'] in ['fast_replication', 'cross_lingual_replication', 'natural_language_replication']
            assert 'reference_audio_id' in params
            assert 'output_format' in params and params['output_format'] in ['wav', 'mp3', 'pcm']
            if params['mode'] == 'natural_language_replication':
                assert 'instruct' in params and params['instruct']
            else:
                if 'instruct' in params:
                    del params['instruct']
    
        def on_open(self, ws):
            """
        The callback function when a WebSocket connection is established.
        Sends a run-task instruction to start a speech synthesis task.
        """
            self._log.debug("WebSocket connected")
    
            # Construct a run-task instruction.
            run_task_cmd = {
                "header": {
                    "action": "run-task",
                    "task_id": self._task_id,
                    "streaming": "duplex"
                },
                "payload": {
                    "task_group": "audio",
                    "task": "tts",
                    "function": "SpeechSynthesizer",
                    "model": "cosyvoice-v2",
                    "parameters": {
                        "mode": self._params['mode'],
                        "reference_audio_id": self._params['reference_audio_id'],
                        "output_format": self._params.get('output_format', 'wav'),
                        "sample_rate": self._params.get('sample_rate', 24000),
                        "bit_rate": self._params.get('bit_rate', '192k'),
                        "volume": self._params.get('volume', 1.0),
                        "instruct": self._params.get('instruct', ''),
                        "speed": self._params.get('speed', 1.0),
                        "debug": True,
                    },
                    "input": {}
                }
            }
    
            # Send the run-task instruction.
            ws.send(json.dumps(run_task_cmd))
            self._log.debug("run-task instruction sent")
    
        def on_message(self, ws, message):
            """
        The callback function when a message is received.
        Processes text and binary messages separately.
        """
            try:
                msg_json = json.loads(message)
                # self._log.debug(f"Received JSON message: {msg_json}")
                self._log.debug(f"Received JSON message: {msg_json['header']['event']}")
    
                if "header" in msg_json:
                    header = msg_json["header"]
    
                    if "event" in header:
                        event = header["event"]
    
                        if event == "task-started":
                            self._log.debug("Task started")
                            self._task_started = True
    
                            # Send a continue-task instruction.
                            for text in self._params['texts']:
                                self.send_continue_task(text)
    
                            # After all continue-task instructions are sent, send a finish-task instruction.
                            self.send_finish_task()
                            self._last_time = time.time()
                        elif event == "result-generated":
                            metrics = msg_json['payload']['metrics']
                            cur_time = time.time()
                            metrics['client_cost_time'] = cur_time - self._last_time
                            self._last_time = cur_time
    
                            encode_data = msg_json["payload"]["output"]["audio"]["data"]
                            decode_data = base64.b64decode(encode_data)
                            self._log.debug(f"Received audio data, size: {len(decode_data)} bytes")
                            self.audio_data += decode_data
    
                            metrics['client_rtf'] = metrics['client_cost_time'] / metrics['speech_len']
                            self._chunk_metrics.append(metrics)
    
                        elif event == "task-finished":
                            self._metrics = {
                                'client_first_package_time': self._chunk_metrics[0]['client_cost_time'],
                                "client_rtf": sum([m["client_cost_time"] for m in self._chunk_metrics]) / sum([m["speech_len"] for m in self._chunk_metrics]),
                                'client_cost_time': sum([m["client_cost_time"] for m in self._chunk_metrics]),
                                'speech_len': sum([m["speech_len"] for m in self._chunk_metrics]),
                                'server_first_package_time': self._chunk_metrics[0]['server_cost_time'],
                                'server_rtf': sum([m["server_cost_time"] for m in self._chunk_metrics]) / sum([m["speech_len"] for m in self._chunk_metrics]),
                                'server_cost_time': sum([m["server_cost_time"] for m in self._chunk_metrics]),
                                "generate_time": sum([m["generate_time"] for m in self._chunk_metrics])
                            }
    
                            self._log.debug(f"Task finished. Request performance metrics: client_first_package_time: {self._metrics['client_first_package_time']:.3f}, client_rtf: {self._metrics['client_rtf']:.3f}, client_cost_time: {self._metrics['client_cost_time']:.3f}, speech_len: {self._metrics['speech_len']:.3f}, server_cost_time: {self._metrics['server_cost_time']:.3f}, generate_time: {self._metrics['generate_time']:.3f}")
                            self._task_finished = True
                            self.close(ws)
    
                        elif event == "task-failed":
                            self._log.error(f"Task failed: {msg_json}")
                            self._task_finished = True
                            self.close(ws)
    
            except json.JSONDecodeError as e:
                self._log.error(f"JSON parsing failed: {str(e)}\t{traceback.format_exc()}")
    
        def on_error(self, ws, error):
            """The callback function when an error occurs."""
            self._log.error(f"WebSocket error: {error}\t{traceback.format_exc()}")
            self._metrics = {'error': error}
    
        def on_close(self, ws, close_status_code, close_msg):
            """The callback function when the connection is closed."""
            self._log.debug(f"WebSocket closed: {close_msg} ({close_status_code})")
    
        def send_continue_task(self, text):
            """Sends a continue-task instruction with the text to be synthesized."""
            cmd = {
                "header": {
                    "action": "continue-task",
                    "task_id": self._task_id,
                    "streaming": "duplex"
                },
                "payload": {
                    "input": {
                        "text": text
                    }
                }
            }
    
            self._ws.send(json.dumps(cmd))
            self._log.debug(f"Sent continue-task instruction, text content: {text}")
    
        def send_finish_task(self):
            """Sends a finish-task instruction to end the speech synthesis task."""
            cmd = {
                "header": {
                    "action": "finish-task",
                    "task_id": self._task_id,
                    "streaming": "duplex"
                },
                "payload": {
                    "input": {}
                }
            }
    
            self._ws.send(json.dumps(cmd))
            self._log.debug("Sent finish-task instruction")
    
        def close(self, ws):
            """Actively closes the connection."""
            if ws and ws.sock and ws.sock.connected:
                ws.close()
                self._log.debug("Connection actively closed")
    
        def run(self):
            """Starts the WebSocket client."""
            # Set the request header for authentication.
            header = {
                "Authorization": f"Bearer {self._api_key}",
            }
    
            # Create a WebSocketApp instance.
            self._ws = websocket.WebSocketApp(
                self._uri,
                header=header,
                on_open=self.on_open,
                on_message=self.on_message,
                on_error=self.on_error,
                on_close=self.on_close
            )
    
            self._log.debug("Listening for WebSocket messages...")
            self._ws.run_forever()  # Start the persistent connection listener.
    
    
    # Example
    if __name__ == "__main__":
        API_KEY = "<EAS_TOKEN>"                                      # Replace <EAS_TOKEN> with the service token.
        SERVER_URI = "ws://<EAS_SERVICE_URL>/api-ws/v1/audio/speech" # Replace <EAS_SERVICE_URL> with the service endpoint.
                                                                     # Example: "ws://cosyvoice-frontend-test.1534081855183999.cn-hangzhou.pai-eas.aliyuncs.com/api-ws/v1/audio/speech"
        texts = [
            "Receiving a birthday gift from a friend far away, the unexpected surprise and deep blessings filled my heart with sweet joy, and my smile bloomed like a flower."
        ]
        params = {
            "mode": "natural_language_replication",
            "texts": texts,
            "reference_audio_id": "<reference_audio_id>",
            "speed": 1.0,
            "output_format": "mp3",
            "sample_rate": 24000,
            "bit_rate": "48k",
            "volume": 2.0,
            "instruct": "Speak in a calm tone"
        }
    
        client = TTSClient(API_KEY, SERVER_URI, params, log_level='DEBUG')
        client.run()
        with open('./websocket_stream.mp3', 'wb') as wfile:
            wfile.write(client.audio_data)