Qwen-Omni-Realtime is a real-time audio and video chat model from the Qwen series. It can understand streaming audio and image inputs, such as continuous image frames extracted from a video stream in real time. It can also generate high-quality text and audio in real time.
Procedure
1. Establish a connection
Qwen-Omni-Realtime is accessed through the WebSocket protocol. You can establish a connection using the following Python code example or the DashScope SDK.
A single WebSocket session for Qwen-Omni-Realtime can last for a maximum of 30 minutes. After this limit is reached, the service automatically closes the connection.
Native WebSocket connection
The following configuration items are required:
Configuration item | Description |
Endpoint | China (Beijing): wss://dashscope.aliyuncs.com/api-ws/v1/realtime International (Singapore): wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime |
Query parameter | The query parameter is `model`. It must be set to the name of the model you want to access. Example: |
Request header | Use Bearer Token for authentication: Authorization: Bearer DASHSCOPE_API_KEY DASHSCOPE_API_KEY is the API key that you requested on Model Studio. |
# pip install websocket-client
import json
import websocket
import os
API_KEY=os.getenv("DASHSCOPE_API_KEY")
API_URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-omni-flash-realtime"
headers = [
"Authorization: Bearer " + API_KEY
]
def on_open(ws):
print(f"Connected to server: {API_URL}")
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
def on_error(ws, error):
print("Error:", error)
ws = websocket.WebSocketApp(
API_URL,
header=headers,
on_open=on_open,
on_message=on_message,
on_error=on_error
)
ws.run_forever()DashScope SDK
# SDK version 1.23.9 or later
import os
import json
from dashscope.audio.qwen_omni import OmniRealtimeConversation,OmniRealtimeCallback
import dashscope
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured an API key, change the following line to dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
class PrintCallback(OmniRealtimeCallback):
def on_open(self) -> None:
print("Connected Successfully")
def on_event(self, response: dict) -> None:
print("Received event:")
print(json.dumps(response, indent=2, ensure_ascii=False))
def on_close(self, close_status_code: int, close_msg: str) -> None:
print(f"Connection closed (code={close_status_code}, msg={close_msg}).")
callback = PrintCallback()
conversation = OmniRealtimeConversation(
model="qwen3-omni-flash-realtime",
callback=callback,
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
)
try:
conversation.connect()
print("Conversation started. Press Ctrl+C to exit.")
conversation.thread.join()
except KeyboardInterrupt:
conversation.close()// SDK version 2.20.9 or later
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import java.util.concurrent.CountDownLatch;
public class Main {
public static void main(String[] args) throws InterruptedException, NoApiKeyException {
CountDownLatch latch = new CountDownLatch(1);
OmniRealtimeParam param = OmniRealtimeParam.builder()
.model("qwen3-omni-flash-realtime")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
.build();
OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
@Override
public void onOpen() {
System.out.println("Connected Successfully");
}
@Override
public void onEvent(JsonObject message) {
System.out.println(message);
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
latch.countDown();
}
});
conversation.connect();
latch.await();
conversation.close(1000, "bye");
System.exit(0);
}
}2. Configure the session
Send the session.update client event:
{
// The ID of this event, generated by the client.
"event_id": "event_ToPZqeobitzUJnt3QqtWg",
// The event type. This is fixed to session.update.
"type": "session.update",
// Session configuration.
"session": {
// The output modalities. Supported values are ["text"] (text only) or ["text", "audio"] (text and audio).
"modalities": [
"text",
"audio"
],
// The voice for the output audio.
"voice": "Cherry",
// The input audio format. Only pcm16 is supported.
"input_audio_format": "pcm16",
// The output audio format. Only pcm24 is supported.
"output_audio_format": "pcm24",
// The system message. It is used to set the model's goal or role.
"instructions": "You are an AI customer service agent for a five-star hotel. Answer customer inquiries about room types, facilities, prices, and booking policies accurately and friendly. Always respond with a professional and helpful attitude. Do not provide unconfirmed information or information beyond the scope of the hotel's services.",
// Specifies whether to enable voice activity detection. To enable it, pass a configuration object. The server will automatically detect the start and end of speech based on this object.
// Set to null to let the client decide when to initiate a model response.
"turn_detection": {
// The VAD type. It must be set to server_vad.
"type": "server_vad",
// The VAD detection threshold. Increase this value in noisy environments and decrease it in quiet environments.
"threshold": 0.5,
// The duration of silence to detect the end of speech. If this value is exceeded, a model response is triggered.
"silence_duration_ms": 800
}
}
}3. Input audio and images
The client sends Base64-encoded audio and image data to the server buffer using the input_audio_buffer.append and input_image_buffer.append events. Audio input is required. Image input is optional.
Images can be from local files or captured in real time from a video stream.
When server-side Voice Activity Detection (VAD) is enabled, the server automatically submits the data and triggers a response when it detects the end of speech. When VAD is disabled (manual mode), the client must call the input_audio_buffer.commit event to submit the data.
4. Receive model responses
The format of the model response depends on the configured output modalities.
Text only
Receive streaming text through the response.text.delta event. Get the full text with the response.text.done event.
Text and audio
Text: Receive streaming text through the response.audio_transcript.delta event. Get the full text with the response.audio_transcript.done event.
Audio: Get Base64-encoded streaming audio output data through the response.audio.delta event. The response.audio.done event indicates that the audio data generation is complete.
Model availability
Qwen3-Omni-Flash-Realtime is the latest real-time multimodal model in the Qwen series. Compared to the previous generation model Qwen-Omni-Turbo-Realtime, which will no longer be updated, Qwen3-Omni-Flash-Realtime has the following advantages:
Supported languages
The number of supported languages has increased to 10, including Chinese (Mandarin and dialects such as Shanghainese, Cantonese, and Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, and Korean. Qwen-Omni-Turbo-Realtime supports only two languages: Chinese (Mandarin) and English.
Supported voices
qwen3-omni-flash-realtime-2025-12-01 supports 49 voices. qwen3-omni-flash-realtime-2025-09-15 and qwen3-omni-realtime-flash supports 17. Qwen-Omni-Turbo-Realtime supports only 4. For more information, see Voice list.
International (Singapore)
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime Equivalent to qwen3-omni-flash-realtime-2025-09-15 | Stable | 65,536 | 49,152 | 16,384 | 1 million tokens each, regardless of modality Valid for 90 days after you activate Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
China (Beijing)
Model | Version | Context window | Max input | Max output | Free quota |
(Tokens) | |||||
qwen3-omni-flash-realtime Equivalent to qwen3-omni-flash-realtime-2025-09-15 | Stable | 65,536 | 49,152 | 16,384 | No free quota |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
Getting started
Create an API key and export the API key as an environment variable.
Choose a programming language you are familiar with and follow the steps below to quickly start a real-time conversation with Qwen-Omni-Realtime.
DashScope Python SDK
Prepare the runtime environment
Your Python version must be 3.10 or later.
First, install pyaudio based on your operating system.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
If you are not using a virtual environment, you can install it directly using the system package manager:
sudo apt-get install python3-pyaudioIf you are in a virtual environment, you must first install the compilation dependencies:
sudo apt update sudo apt install -y python3-dev portaudio19-devThen, install it using pip in the activated virtual environment:
pip install pyaudio
CentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioAfter the installation is complete, install the dependencies using pip:
pip install websocket-client dashscopeChoose an interaction mode
VAD mode (automatically detects the start and end of speech)
The server automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
Create a new Python file named vad_dash.py and copy the following code into the file:
Run
vad_dash.pyto have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.Manual mode
Create a new Python file named
manual_dash.pyand copy the following code into the file:Run
manual_dash.py, press Enter to start speaking, and press Enter again to receive the model's audio response.
DashScope Java SDK
Choose an interaction mode
VAD mode (automatically detects the start and end of speech)
The Realtime API automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
Run the OmniServerVad.main() method to have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.
Manual mode
Run the OmniWithoutServerVad.main() method. Press Enter to start recording. During recording, press Enter again to stop recording and send the audio. The model's response is then received and played.
WebSocket (Python)
Prepare the runtime environment
Your Python version must be 3.10 or later.
First, install pyaudio based on your operating system.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
sudo apt-get install python3-pyaudio or pip install pyaudioWe recommend using
pip install pyaudio. If the installation fails, first install theportaudiodependency for your operating system.CentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioAfter the installation is complete, install the websocket-related dependencies using pip:
pip install websockets==15.0.1Create the client
Create a new Python file named
omni_realtime_client.pyin your local directory and copy the following code into the file:Choose an interaction mode
VAD mode (automatically detects the start and end of speech)
The Realtime API automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
In the same directory as
omni_realtime_client.py, create another Python file namedvad_mode.pyand copy the following code into the file:Run
vad_mode.pyto have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.Manual mode
In the same directory as
omni_realtime_client.py, create another Python file namedmanual_mode.pyand copy the following code into the file:Run
manual_mode.py, press Enter to start speaking, and press Enter again to receive the model's audio response.
Interaction flow
VAD mode
Set session.turn_detection in the session.update event to "server_vad" to enable VAD mode. In this mode, the server automatically detects the start and end of speech and responds accordingly. This mode is suitable for voice call scenarios.
The interaction flow is as follows:
The server detects the start of speech and sends the input_audio_buffer.speech_started event.
The client can send input_audio_buffer.append and input_image_buffer.append events at any time to append audio and images to the buffer.
Before sending an input_image_buffer.append event, you must send at least one input_audio_buffer.append event.
The server detects the end of speech and sends the input_audio_buffer.speech_stopped event.
The server sends the input_audio_buffer.committed event to commit the audio buffer.
The server sends a conversation.item.created event, which contains the user message item created from the buffer.
Lifecycle | Client events | Server events |
Session initialization | Session configuration | Session created Session configuration updated |
User audio input | Add audio to the buffer Add an image to the buffer | input_audio_buffer.speech_started Speech start detected input_audio_buffer.speech_stopped Speech end detected Server received the submitted audio |
Server audio output | None | Server starts generating a response New output content during response Conversation item created New output content added to the assistant message response.audio_transcript.delta Incrementally generated transcribed text Incrementally generated audio from the model response.audio_transcript.done Text transcription complete Audio generation complete Streaming of text or audio content for the assistant message is complete Streaming of the entire output item for the assistant message is complete Response complete |
Manual mode
Set session.turn_detection in the session.update event to null to enable Manual mode. In this mode, the client requests a server response by explicitly sending the input_audio_buffer.commit and response.create events. This mode is suitable for push-to-talk scenarios, such as sending voice messages in chat applications.
The interaction flow is as follows:
The client can send input_audio_buffer.append and input_image_buffer.append events at any time to append audio and images to the buffer.
Before sending an input_image_buffer.append event, you must send at least one input_audio_buffer.append event.
The client sends the input_audio_buffer.commit event to commit the audio and image buffers. This informs the server that all user input, including audio and images, for the current turn has been sent.
The server responds with an input_audio_buffer.committed event.
The client sends a response.create event and waits for the model's output from the server.
The server responds with a conversation.item.created event.
Lifecycle | Client events | Server events |
Session initialization | Session configuration | Session created Session configuration updated |
User audio input | Add audio to the buffer Add an image to the buffer Submit audio and images to the server Create a model response | Server received the submitted audio |
Server audio output | Clear the audio from the buffer | Server starts generating a response New output content during response Conversation item created New output content added to the assistant message item response.audio_transcript.delta Incrementally generated transcribed text Incrementally generated audio from the model response.audio_transcript.done Text transcription complete Audio generation complete Streaming of text or audio content for the assistant message is complete Streaming of the entire output item for the assistant message is complete Response complete |
API reference
Billing and rate limiting
Billing rules
Qwen-Omni-Realtime is billed based on the number of tokens used for different input modalities, such as audio and images. For more information about billing, see Models.
Throttling
For more information about model throttling rules, see Throttling.
Error codes
If a call fails, see Error messages for troubleshooting.
Voice list
Set the voice request parameter to the value in the voice parameter column.qwen3-omni-flash-realtime-2025-12-01
Name |
| Sample | Description | Supported languages |
Cherry | Cherry | A cheerful, friendly, and natural young woman's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Serena | Serena | Gentle young woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Serena | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Chelsie | Chelsie | An anime-style virtual girlfriend voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Momo | Momo | A playful and cute voice to cheer you up. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Vivian | Vivian | A cool, cute, and slightly grumpy voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Moon | Moon | The dashing and carefree Yuebai | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Maia | Maia | A voice that blends intelligence and gentleness. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Kai | Kai | A spa for your ears. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nofish | Nofish | A designer who does not use retroflex consonants. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bella | Bella | A loli who drinks but does not practice Drunken Fist. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Jennifer | Jennifer | A premium, cinematic American English female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ryan | Ryan | A rhythmic, dramatic voice with realism and tension. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Katerina | Katerina | A mature and rhythmic female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Aiden | Aiden | An American young man who is a great cook. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Eldric Sage | Eldric Sage | A calm, wise, and weathered old man's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Mia | Mia | As gentle as spring water, as quiet as the first snow | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Mochi | Mochi | A smart and precocious child's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bellona | Bellona | A powerful and sonorous voice with clear articulation that brings characters to life and stirs passion in the listener. The clash of swords and the thunder of hooves echo in your mind, as a world of a thousand voices unfolds in perfectly clear and resonant tones. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Vincent | Vincent | A unique, raspy voice that evokes epic tales of heroism. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Meng Xiaoji | Bunny | A little loli brimming with moe appeal. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Neil | Neil | A professional news anchor's voice with a clear, steady tone. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Elias | Elias | Explains complex topics with academic rigor and clear storytelling. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Arthur | Arthur | A rustic, weathered voice of an old storyteller. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nini | Nini | A voice as soft and sweet as a mochi. Its drawn-out calls of 'older brother' are heart-meltingly sweet. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ebona | Ebona | A whispering voice that unlocks your deepest childhood fears. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Seren | Seren | A gentle and soothing voice to help you fall asleep. Good night and sweet dreams. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Pip | Pip | Mischievous yet full of childlike innocence. Is this the Shin-chan you remember? | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Stella | Stella | A sweet magical girl voice that is both ditsy and powerful. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bodega | Bodega | Enthusiastic Spanish uncle | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sonisha | Sonrisa | A warm and cheerful Latin American lady | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Alek | Alek | A Russian voice that is both cool and warm. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Dolce | Dolce | Lazy Italian uncle | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sohee | Sohee | A gentle, cheerful, and expressive Korean older sister's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ono Anna | Ono Anna | A quirky and clever childhood friend's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Lenn | Lenn | Rational at the core and rebellious in the details-a young German who wears a suit and listens to post-punk. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Emilien | Emilien | A romantic French gentleman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Andre | Andre | A magnetic, natural, and calm male voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Radio Gol | Radio Gol | Football Poet Rádio Gol! Today, I will use names to call the football game for you. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shanghai-Jada | Jada | A lively woman from Shanghai. | Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Beijing-Dylan | Dylan | A teenager who grew up in the hutongs of Beijing. | Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nanjing-Li | Li | A patient yoga teacher. | Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shaanxi-Marcus | Marcus | A sincere and deep voice from Shaanxi. | Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Minnan-Roy | Roy | A humorous, straightforward, and lively young man from Taiwan. | Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Tianjin-Peter | Peter | A voice for the straight man in Tianjin crosstalk. | Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan-Sunny | Sunny | A sweet Sichuan girl's voice that will melt your heart. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan-Eric | Eric | A man from Chengdu, Sichuan, who has risen above the mundane. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Rocky | Rocky | A witty and humorous male voice for online chats. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Kiki | Kiki | A sweet best friend from Hong Kong. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
qwen3-omni-flash-realtime, qwen3-omni-flash-realtime-2025-09-15
Name |
| Sample | Description | Supported languages |
Cherry | Cherry | A cheerful, friendly, and natural young woman's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nofish | Nofish | A designer who does not use retroflex consonants. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Jennifer | Jennifer | A premium, cinematic American English female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ryan | Ryan | A rhythmic, dramatic voice with realism and tension. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Katerina | Katerina | A mature and rhythmic female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Elias | Elias | Explains complex topics with academic rigor and clear storytelling. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shanghai-Jada | Jada | A lively woman from Shanghai. | Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Beijing-Dylan | Dylan | A teenager who grew up in the hutongs of Beijing. | Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan-Sunny | Sunny | A sweet Sichuan girl's voice that will melt your heart. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nanjing-Li | Li | A patient yoga teacher. | Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shaanxi-Marcus | Marcus | A sincere and deep voice from Shaanxi. | Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Minnan-Roy | Roy | A humorous, straightforward, and lively young man from Taiwan. | Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Tianjin-Peter | Peter | A voice for the straight man in Tianjin crosstalk. | Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Rocky | Rocky | A witty and humorous male voice for online chats. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese-Kiki | Kiki | A sweet best friend from Hong Kong. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan - Eric | Eric | An extraordinary man from Chengdu, Sichuan | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
Qwen-Omni-Turbo-Realtime
Name |
| Sample | Description | Supported languages |
Cherry | Cherry | A cheerful, friendly, and natural young woman's voice. | Chinese, English | |
Serena | Serena | A gentle young woman's voice. | Chinese, English | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice. | Chinese, English | |
Chelsie | Chelsie | An anime-style virtual girlfriend voice. | Chinese, English |