Qwen-Omni-Realtime is a real-time audio and video chat model from the Qwen series. It can understand streaming audio and image inputs, such as continuous image frames extracted from a video stream in real time. It can also generate high-quality text and audio in real time.
Procedure
1. Establish a connection
Qwen-Omni-Realtime is accessed through the WebSocket protocol. You can establish a connection using the following Python code example or the DashScope SDK.
A single WebSocket session for Qwen-Omni-Realtime can last for a maximum of 120 minutes. After this limit is reached, the service automatically closes the connection.
Native WebSocket connection
The following configuration items are required:
Configuration item | Description |
Endpoint | China (Beijing): wss://dashscope.aliyuncs.com/api-ws/v1/realtime International (Singapore): wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime |
Query parameter | The query parameter is `model`. It must be set to the name of the model you want to access. Example: |
Request header | Use Bearer Token for authentication: Authorization: Bearer DASHSCOPE_API_KEY DASHSCOPE_API_KEY is the API key that you requested on Model Studio. |
# pip install websocket-client
import json
import websocket
import os
API_KEY=os.getenv("DASHSCOPE_API_KEY")
API_URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-omni-flash-realtime"
headers = [
"Authorization: Bearer " + API_KEY
]
def on_open(ws):
print(f"Connected to server: {API_URL}")
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
def on_error(ws, error):
print("Error:", error)
ws = websocket.WebSocketApp(
API_URL,
header=headers,
on_open=on_open,
on_message=on_message,
on_error=on_error
)
ws.run_forever()DashScope SDK
# SDK version 1.23.9 or later
import os
import json
from dashscope.audio.qwen_omni import OmniRealtimeConversation,OmniRealtimeCallback
import dashscope
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured an API key, change the following line to dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
class PrintCallback(OmniRealtimeCallback):
def on_open(self) -> None:
print("Connected Successfully")
def on_event(self, response: dict) -> None:
print("Received event:")
print(json.dumps(response, indent=2, ensure_ascii=False))
def on_close(self, close_status_code: int, close_msg: str) -> None:
print(f"Connection closed (code={close_status_code}, msg={close_msg}).")
callback = PrintCallback()
conversation = OmniRealtimeConversation(
model="qwen3-omni-flash-realtime",
callback=callback,
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
)
try:
conversation.connect()
print("Conversation started. Press Ctrl+C to exit.")
conversation.thread.join()
except KeyboardInterrupt:
conversation.close()// SDK version 2.20.9 or later
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import java.util.concurrent.CountDownLatch;
public class Main {
public static void main(String[] args) throws InterruptedException, NoApiKeyException {
CountDownLatch latch = new CountDownLatch(1);
OmniRealtimeParam param = OmniRealtimeParam.builder()
.model("qwen3-omni-flash-realtime")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
.build();
OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
@Override
public void onOpen() {
System.out.println("Connected Successfully");
}
@Override
public void onEvent(JsonObject message) {
System.out.println(message);
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
latch.countDown();
}
});
conversation.connect();
latch.await();
conversation.close(1000, "bye");
System.exit(0);
}
}2. Configure the session
Send the session.update client event:
{
// The ID of this event, generated by the client.
"event_id": "event_ToPZqeobitzUJnt3QqtWg",
// The event type. This is fixed to session.update.
"type": "session.update",
// Session configuration.
"session": {
// The output modalities. Supported values are ["text"] (text only) or ["text", "audio"] (text and audio).
"modalities": [
"text",
"audio"
],
// The voice for the output audio.
"voice": "Cherry",
// The input audio format. Only pcm16 is supported.
"input_audio_format": "pcm16",
// The output audio format.
// Qwen3-Omni-Flash-Realtime: Only supports pcm24. Qwen-Omni-Turbo-Realtime: Only supports pcm16.
"output_audio_format": "pcm24",
// The system message. It is used to set the model's goal or role.
"instructions": "You are an AI customer service agent for a five-star hotel. Answer customer inquiries about room types, facilities, prices, and booking policies accurately and friendly. Always respond with a professional and helpful attitude. Do not provide unconfirmed information or information beyond the scope of the hotel's services.",
// Specifies whether to enable voice activity detection. To enable it, pass a configuration object. The server will automatically detect the start and end of speech based on this object.
// Set to null to let the client decide when to initiate a model response.
"turn_detection": {
// The VAD type. It must be set to server_vad.
"type": "server_vad",
// The VAD detection threshold. Increase this value in noisy environments and decrease it in quiet environments.
"threshold": 0.5,
// The duration of silence to detect the end of speech. If this value is exceeded, a model response is triggered.
"silence_duration_ms": 800
}
}
}3. Input audio and images
The client sends Base64-encoded audio and image data to the server buffer using the input_audio_buffer.append and input_image_buffer.append events. Audio input is required. Image input is optional.
Images can be from local files or captured in real time from a video stream.
When server-side Voice Activity Detection (VAD) is enabled, the server automatically submits the data and triggers a response when it detects the end of speech. When VAD is disabled (manual mode), the client must call the input_audio_buffer.commit event to submit the data.
4. Receive model responses
The format of the model response depends on the configured output modalities.
Text only
Receive streaming text through the response.text.delta event. Get the full text with the response.text.done event.
Text and audio
Text: Receive streaming text through the response.audio_transcript.delta event. Get the full text with the response.audio_transcript.done event.
Audio: Get Base64-encoded streaming audio output data through the response.audio.delta event. The response.audio.done event indicates that the audio data generation is complete.
Availability
Region availability
Model availability
Qwen3-Omni-Flash-Realtime is the latest real-time multimodal model in the Qwen series. Compared to the previous generation model Qwen-Omni-Turbo-Realtime, which will no longer be updated, Qwen3-Omni-Flash-Realtime has the following advantages:
Supported languages
The number of supported languages has increased to 10, including Chinese (Mandarin and dialects such as Shanghainese, Cantonese, and Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, and Korean. Qwen-Omni-Turbo-Realtime supports only two languages: Chinese (Mandarin) and English.
Supported voices
qwen3-omni-flash-realtime-2025-12-01 supports 49 voices. qwen3-omni-flash-realtime-2025-09-15 and qwen3-omni-realtime-flash supports 17. Qwen-Omni-Turbo-Realtime supports only 4. For more information, see Voice list.
International (Singapore)
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime Offers the same capabilities as qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | 1 million tokens (regardless of modality) Validity: 90 days after you activate Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
China (Beijing)
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime This model provides the same capabilities as qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | No free quota |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
Getting started
Get an API key and export the API key as an environment variable.
Choose a programming language you are familiar with and follow the steps below to quickly start a real-time conversation with Qwen-Omni-Realtime.
DashScope Python SDK
Prepare the runtime environment
Your Python version must be 3.10 or later.
First, install pyaudio based on your operating system.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
If you are not using a virtual environment, you can install it directly using the system package manager:
sudo apt-get install python3-pyaudioIf you are in a virtual environment, you must first install the compilation dependencies:
sudo apt update sudo apt install -y python3-dev portaudio19-devThen, install it using pip in the activated virtual environment:
pip install pyaudio
CentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioAfter the installation is complete, install the dependencies using pip:
pip install websocket-client dashscopeChoose an interaction mode
VAD mode (automatically detects the start and end of speech)
The server automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
Create a new Python file named vad_dash.py and copy the following code into the file:
Run
vad_dash.pyto have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.Manual mode
Create a new Python file named
manual_dash.pyand copy the following code into the file:Run
manual_dash.py, press Enter to start speaking, and press Enter again to receive the model's audio response.
DashScope Java SDK
Choose an interaction mode
VAD mode (automatically detects the start and end of speech)
The Realtime API automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
Run the OmniServerVad.main() method to have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.
Manual mode
Run the OmniWithoutServerVad.main() method. Press Enter to start recording. During recording, press Enter again to stop recording and send the audio. The model's response is then received and played.
WebSocket (Python)
Prepare the runtime environment
Your Python version must be 3.10 or later.
First, install pyaudio based on your operating system.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
sudo apt-get install python3-pyaudio or pip install pyaudioWe recommend using
pip install pyaudio. If the installation fails, first install theportaudiodependency for your operating system.CentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioAfter the installation is complete, install the websocket-related dependencies using pip:
pip install websockets==15.0.1Create the client
Create a new Python file named
omni_realtime_client.pyin your local directory and copy the following code into the file:Choose an interaction mode
VAD mode (automatically detects the start and end of speech)
The Realtime API automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
In the same directory as
omni_realtime_client.py, create another Python file namedvad_mode.pyand copy the following code into the file:Run
vad_mode.pyto have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.Manual mode
In the same directory as
omni_realtime_client.py, create another Python file namedmanual_mode.pyand copy the following code into the file:Run
manual_mode.py, press Enter to start speaking, and press Enter again to receive the model's audio response.
Interaction flow
VAD mode
Set session.turn_detection in the session.update event to "server_vad" to enable VAD mode. In this mode, the server automatically detects the start and end of speech and responds accordingly. This mode is suitable for voice call scenarios.
The interaction flow is as follows:
The server detects the start of speech and sends the input_audio_buffer.speech_started event.
The client can send input_audio_buffer.append and input_image_buffer.append events at any time to append audio and images to the buffer.
Before sending an input_image_buffer.append event, you must send at least one input_audio_buffer.append event.
The server detects the end of speech and sends the input_audio_buffer.speech_stopped event.
The server sends the input_audio_buffer.committed event to commit the audio buffer.
The server sends a conversation.item.created event, which contains the user message item created from the buffer.
Lifecycle | Client events | Server events |
Session initialization | Session configuration | Session created Session configuration updated |
User audio input | Add audio to the buffer Add an image to the buffer | input_audio_buffer.speech_started Speech start detected input_audio_buffer.speech_stopped Speech end detected Server received the submitted audio |
Server audio output | None | Server starts generating a response New output content during response Conversation item created New output content added to the assistant message response.audio_transcript.delta Incrementally generated transcribed text Incrementally generated audio from the model response.audio_transcript.done Text transcription complete Audio generation complete Streaming of text or audio content for the assistant message is complete Streaming of the entire output item for the assistant message is complete Response complete |
Manual mode
Set session.turn_detection in the session.update event to null to enable Manual mode. In this mode, the client requests a server response by explicitly sending the input_audio_buffer.commit and response.create events. This mode is suitable for push-to-talk scenarios, such as sending voice messages in chat applications.
The interaction flow is as follows:
The client can send input_audio_buffer.append and input_image_buffer.append events at any time to append audio and images to the buffer.
Before sending an input_image_buffer.append event, you must send at least one input_audio_buffer.append event.
The client sends the input_audio_buffer.commit event to commit the audio and image buffers. This informs the server that all user input, including audio and images, for the current turn has been sent.
The server responds with an input_audio_buffer.committed event.
The client sends a response.create event and waits for the model's output from the server.
The server responds with a conversation.item.created event.
Lifecycle | Client events | Server events |
Session initialization | Session configuration | Session created Session configuration updated |
User audio input | Add audio to the buffer Add an image to the buffer Submit audio and images to the server Create a model response | Server received the submitted audio |
Server audio output | Clear the audio from the buffer | Server starts generating a response New output content during response Conversation item created New output content added to the assistant message item response.audio_transcript.delta Incrementally generated transcribed text Incrementally generated audio from the model response.audio_transcript.done Text transcription complete Audio generation complete Streaming of text or audio content for the assistant message is complete Streaming of the entire output item for the assistant message is complete Response complete |
API reference
Billing and rate limiting
Billing rules
Qwen-Omni-Realtime is billed based on the number of tokens used for different input modalities, such as audio and images. For more information about billing, see Models.
Throttling
For more information about model throttling rules, see Throttling.
Error codes
If a call fails, see Error messages for troubleshooting.
Voice list
Set the voice request parameter to the value in the voice parameter column.qwen3-omni-flash-realtime-2025-12-01
Name |
| Sample | Description | Supported languages |
Cherry | Cherry | A sunny, positive, and naturally friendly young woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Serena | Serena | A gentle young woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, energetic, and vibrant voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Chelsie | Chelsie | An anime-style virtual girlfriend | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Momo | Momo | A playful and cute voice to cheer you up | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Vivian | Vivian | A cool, cute, and slightly feisty voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Moon | Moon | The carefree and handsome Moon | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Maia | Maia | A blend of intelligence and gentleness | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Kai | Kai | A spa for your ears | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nofish | Nofish | A designer who does not use retroflex consonants | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bella | Bella | A little girl who drinks but never gets drunk | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Jennifer | Jennifer | A premium, cinematic American English female voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ryan | Ryan | A rhythmic, dramatic voice with realism and tension | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Katerina | Katerina | A mature female voice with rich rhythm and lingering resonance | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Aiden | Aiden | An American young man skilled in cooking | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Eldric Sage | Eldric Sage | A calm and wise elder, weathered like an ancient pine yet with a mind as clear as a mirror | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Mia | Mia | Gentle as spring water, quiet as the first snow | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Mochi | Mochi | A clever and bright "little adult" who retains childlike innocence yet possesses Zen-like wisdom | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bellona | Bellona | A powerful and sonorous voice with clear articulation that brings characters to life and stirs passion in the listener. The clash of swords and thunder of hooves echo in your dreams, revealing a world of countless voices through perfectly clear and resonant tones. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Vincent | Vincent | A uniquely raspy, smoky voice that instantly evokes tales of vast armies and heroic adventures | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bunny | Bunny | An extremely cute little girl bursting with moe appeal | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Neil | Neil | A professional news anchor's voice with a flat baseline intonation and precise, clear pronunciation | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Elias | Elias | Maintains academic rigor while using storytelling techniques to transform complex knowledge into digestible cognitive modules | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Arthur | Arthur | A rustic voice soaked by time and dry tobacco, leisurely recounting village tales and oddities | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nini | Nini | A soft and sticky voice like mochi, whose drawn-out calls of "older brother" are sweet enough to melt your bones | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ebona | Ebona | Her whisper is like a rusty key slowly turning the darkest corners of your innermost self—where all your unacknowledged childhood shadows and unknown fears lie hidden | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Seren | Seren | A gentle and soothing voice to help you fall asleep faster. Good night and sweet dreams | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Pip | Pip | Mischievous yet full of childlike innocence—is this the Shin-chan from your memory? | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Stella | Stella | Normally a sickeningly sweet, dazed girl voice, but when shouting "In the name of the moon, I'll punish you!", it instantly fills with undeniable love and justice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bodega | Bodega | An enthusiastic Spanish uncle | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sonrisa | Sonrisa | A warm and cheerful Latin American older sister | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Alek | Alek | Cold like Russia at first utterance, yet warm beneath the wool coat | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Dolce | Dolce | A lazy Italian uncle | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sohee | Sohee | A gentle, cheerful, and emotionally expressive Korean older sister | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ono Anna | Ono Anna | A mischievous childhood sweetheart | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Lenn | Lenn | Rational at heart, rebellious in the details—a young German man who wears a suit and listens to post-punk | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Emilien | Emilien | A romantic French older brother | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Andre | Andre | A magnetic, natural, comfortable, and calm male voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Radio Gol | Radio Gol | Football poet Rádio Gol! Today I will call the football match for you using names. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shanghai-Jada | Jada | A lively Shanghai auntie | Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Beijing-Dylan | Dylan | A teenager raised in Beijing hutongs | Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nanjing-Li | Li | A patient yoga teacher | Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shaanxi-Marcus | Marcus | Broad-faced and brief-spoken, sincere-hearted and deep-voiced—the authentic flavor of Shaanxi | Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Minnan-Roy | Roy | A humorous, straightforward, and lively Taiwanese young man | Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Tianjin-Peter | Peter | Professional straight man in Tianjin crosstalk | Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan-Sunny | Sunny | A Sichuan girl whose sweetness melts your heart | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan-Eric | Eric | A Sichuan Chengdu man who rises above the mundane | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Rocky | Rocky | The humorous and witty Rocky, here for online chatting | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Kiki | Kiki | A sweet Hong Kong best friend | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
qwen3-omni-flash-realtime, qwen3-omni-flash-realtime-2025-09-15
Name |
| Sample | Description | Supported languages |
Cherry | Cherry | A sunny, positive, and naturally friendly young woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, energetic, and vibrant voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nofish | Nofish | A designer who does not use retroflex consonants | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Jennifer | Jennifer | A premium, cinematic American English female voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ryan | Ryan | A rhythmic, dramatic voice with realism and tension | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Katerina | Katerina | A mature female voice with rich rhythm and lingering resonance | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Elias | Elias | Maintains academic rigor while using storytelling techniques to transform complex knowledge into digestible cognitive modules | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shanghai-Jada | Jada | A lively Shanghai auntie | Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Beijing-Dylan | Dylan | A teenager raised in Beijing hutongs | Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan-Sunny | Sunny | A Sichuan girl whose sweetness melts your heart | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nanjing-Li | Li | A patient yoga teacher | Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shaanxi-Marcus | Marcus | Broad-faced and brief-spoken, sincere-hearted and deep-voiced—the authentic flavor of Shaanxi | Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Minnan-Roy | Roy | A humorous, straightforward, and lively Taiwanese young man | Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Tianjin-Peter | Peter | Professional straight man in Tianjin crosstalk | Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Rocky | Rocky | The humorous and witty Rocky, here for online chatting | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Kiki | Kiki | A sweet Hong Kong best friend | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan-Eric | Eric | A Sichuan Chengdu man who rises above the mundane | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
Qwen-Omni-Turbo-Realtime
Name |
| Sample | Description | Supported languages |
Cherry | Cherry | A sunny, positive, and naturally friendly young woman | Chinese, English | |
Serena | Serena | A gentle young woman | Chinese, English | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, energetic, and vibrant voice | Chinese, English | |
Chelsie | Chelsie | An anime-style virtual girlfriend | Chinese, English |