Qwen-Omni-Realtime is a real-time audio and video chat model from the Qwen series. It can understand streaming audio and image inputs, such as continuous image frames extracted from a video stream in real time. It can also generate high-quality text and audio in real time.
Procedure
1. Establish a connection
Qwen-Omni-Realtime is accessed through the WebSocket protocol. You can establish a connection using the following Python code example or the DashScope SDK.
A single WebSocket session for Qwen-Omni-Realtime can last for a maximum of 120 minutes. After this limit is reached, the service automatically closes the connection.
Native WebSocket connection
The following configuration items are required:
Configuration item | Description |
Endpoint | China (Beijing): wss://dashscope.aliyuncs.com/api-ws/v1/realtime International (Singapore): wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime |
Query parameter | The query parameter is `model`. It must be set to the name of the model you want to access. Example: |
Request header | Use Bearer Token for authentication: Authorization: Bearer DASHSCOPE_API_KEY DASHSCOPE_API_KEY is the API key that you requested on Model Studio. |
# pip install websocket-client
import json
import websocket
import os
API_KEY=os.getenv("DASHSCOPE_API_KEY")
API_URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-omni-flash-realtime"
headers = [
"Authorization: Bearer " + API_KEY
]
def on_open(ws):
print(f"Connected to server: {API_URL}")
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
def on_error(ws, error):
print("Error:", error)
ws = websocket.WebSocketApp(
API_URL,
header=headers,
on_open=on_open,
on_message=on_message,
on_error=on_error
)
ws.run_forever()DashScope SDK
# SDK version 1.23.9 or later
import os
import json
from dashscope.audio.qwen_omni import OmniRealtimeConversation,OmniRealtimeCallback
import dashscope
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured an API key, change the following line to dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
class PrintCallback(OmniRealtimeCallback):
def on_open(self) -> None:
print("Connected Successfully")
def on_event(self, response: dict) -> None:
print("Received event:")
print(json.dumps(response, indent=2, ensure_ascii=False))
def on_close(self, close_status_code: int, close_msg: str) -> None:
print(f"Connection closed (code={close_status_code}, msg={close_msg}).")
callback = PrintCallback()
conversation = OmniRealtimeConversation(
model="qwen3-omni-flash-realtime",
callback=callback,
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
)
try:
conversation.connect()
print("Conversation started. Press Ctrl+C to exit.")
conversation.thread.join()
except KeyboardInterrupt:
conversation.close()// SDK version 2.20.9 or later
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import java.util.concurrent.CountDownLatch;
public class Main {
public static void main(String[] args) throws InterruptedException, NoApiKeyException {
CountDownLatch latch = new CountDownLatch(1);
OmniRealtimeParam param = OmniRealtimeParam.builder()
.model("qwen3-omni-flash-realtime")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
.build();
OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
@Override
public void onOpen() {
System.out.println("Connected Successfully");
}
@Override
public void onEvent(JsonObject message) {
System.out.println(message);
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
latch.countDown();
}
});
conversation.connect();
latch.await();
conversation.close(1000, "bye");
System.exit(0);
}
}2. Configure the session
Send the session.update client event:
{
// The ID of this event, generated by the client.
"event_id": "event_ToPZqeobitzUJnt3QqtWg",
// The event type. This is fixed to session.update.
"type": "session.update",
// Session configuration.
"session": {
// The output modalities. Supported values are ["text"] (text only) or ["text", "audio"] (text and audio).
"modalities": [
"text",
"audio"
],
// The voice for the output audio.
"voice": "Cherry",
// The input audio format. Only pcm16 is supported.
"input_audio_format": "pcm16",
// The output audio format.
// Qwen3-Omni-Flash-Realtime: Only supports pcm24. Qwen-Omni-Turbo-Realtime: Only supports pcm16.
"output_audio_format": "pcm24",
// The system message. It is used to set the model's goal or role.
"instructions": "You are an AI customer service agent for a five-star hotel. Answer customer inquiries about room types, facilities, prices, and booking policies accurately and friendly. Always respond with a professional and helpful attitude. Do not provide unconfirmed information or information beyond the scope of the hotel's services.",
// Specifies whether to enable voice activity detection. To enable it, pass a configuration object. The server will automatically detect the start and end of speech based on this object.
// Set to null to let the client decide when to initiate a model response.
"turn_detection": {
// The VAD type. It must be set to server_vad.
"type": "server_vad",
// The VAD detection threshold. Increase this value in noisy environments and decrease it in quiet environments.
"threshold": 0.5,
// The duration of silence to detect the end of speech. If this value is exceeded, a model response is triggered.
"silence_duration_ms": 800
}
}
}3. Input audio and images
The client sends Base64-encoded audio and image data to the server buffer using the input_audio_buffer.append and input_image_buffer.append events. Audio input is required. Image input is optional.
Images can be from local files or captured in real time from a video stream.
When server-side Voice Activity Detection (VAD) is enabled, the server automatically submits the data and triggers a response when it detects the end of speech. When VAD is disabled (manual mode), the client must call the input_audio_buffer.commit event to submit the data.
4. Receive model responses
The format of the model response depends on the configured output modalities.
Text only
Receive streaming text through the response.text.delta event. Get the full text with the response.text.done event.
Text and audio
Text: Receive streaming text through the response.audio_transcript.delta event. Get the full text with the response.audio_transcript.done event.
Audio: Get Base64-encoded streaming audio output data through the response.audio.delta event. The response.audio.done event indicates that the audio data generation is complete.
Availability
Region availability
Model availability
Qwen3-Omni-Flash-Realtime is the latest real-time multimodal model in the Qwen series. Compared to the previous generation model Qwen-Omni-Turbo-Realtime, which will no longer be updated, Qwen3-Omni-Flash-Realtime has the following advantages:
Supported languages
The number of supported languages has increased to 10, including Chinese (Mandarin and dialects such as Shanghainese, Cantonese, and Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, and Korean. Qwen-Omni-Turbo-Realtime supports only two languages: Chinese (Mandarin) and English.
Supported voices
qwen3-omni-flash-realtime-2025-12-01 supports 49 voices. qwen3-omni-flash-realtime-2025-09-15 and qwen3-omni-realtime-flash supports 17. Qwen-Omni-Turbo-Realtime supports only 4. For more information, see Voice list.
International (Singapore)
Model | Version | Context window | Max input | Max output | Free quota |
(tokens) | |||||
qwen3-omni-flash-realtime Currently qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | 1 million tokens (regardless of modality) Valid for 90 days after activating Model Studio |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
China (Beijing)
Model | Version | Context window | Max input | Max output | Free quota |
(tokens) | |||||
qwen3-omni-flash-realtime This model currently has the same capabilities as qwen3-omni-flash-realtime-2025-12-01. | Stable | 65,536 | 49,152 | 16,384 | No free quota |
qwen3-omni-flash-realtime-2025-12-01 | Snapshot | ||||
qwen3-omni-flash-realtime-2025-09-15 | |||||
Getting started
Get an API key and export the API key as an environment variable.
Choose a programming language you are familiar with and follow the steps below to quickly start a real-time conversation with Qwen-Omni-Realtime.
DashScope Python SDK
Prepare the runtime environment
Your Python version must be 3.10 or later.
First, install pyaudio based on your operating system.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
If you are not using a virtual environment, you can install it directly using the system package manager:
sudo apt-get install python3-pyaudioIf you are in a virtual environment, you must first install the compilation dependencies:
sudo apt update sudo apt install -y python3-dev portaudio19-devThen, install it using pip in the activated virtual environment:
pip install pyaudio
CentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioAfter the installation is complete, install the dependencies using pip:
pip install websocket-client dashscopeChoose an interaction mode
VAD mode (automatically detects the start and end of speech)
The server automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
Create a new Python file named vad_dash.py and copy the following code into the file:
Run
vad_dash.pyto have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.Manual mode
Create a new Python file named
manual_dash.pyand copy the following code into the file:Run
manual_dash.py, press Enter to start speaking, and press Enter again to receive the model's audio response.
DashScope Java SDK
Choose an interaction mode
VAD mode (automatically detects the start and end of speech)
The Realtime API automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
Run the OmniServerVad.main() method to have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.
Manual mode
Run the OmniWithoutServerVad.main() method. Press Enter to start recording. During recording, press Enter again to stop recording and send the audio. The model's response is then received and played.
WebSocket (Python)
Prepare the runtime environment
Your Python version must be 3.10 or later.
First, install pyaudio based on your operating system.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
sudo apt-get install python3-pyaudio or pip install pyaudioWe recommend using
pip install pyaudio. If the installation fails, first install theportaudiodependency for your operating system.CentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioAfter the installation is complete, install the websocket-related dependencies using pip:
pip install websockets==15.0.1Create the client
Create a new Python file named
omni_realtime_client.pyin your local directory and copy the following code into the file:Choose an interaction mode
VAD mode (automatically detects the start and end of speech)
The Realtime API automatically determines when the user starts and stops speaking and responds accordingly.
Manual mode (press to talk, release to send)
The client controls the start and end of speech. After the user finishes speaking, the client must actively send a message to the server.
VAD mode
In the same directory as
omni_realtime_client.py, create another Python file namedvad_mode.pyand copy the following code into the file:Run
vad_mode.pyto have a real-time conversation with Qwen-Omni-Realtime through your microphone. The system detects the start and end of your speech and automatically sends it to the server without manual intervention.Manual mode
In the same directory as
omni_realtime_client.py, create another Python file namedmanual_mode.pyand copy the following code into the file:Run
manual_mode.py, press Enter to start speaking, and press Enter again to receive the model's audio response.
Interaction flow
VAD mode
Set session.turn_detection in the session.update event to "server_vad" to enable VAD mode. In this mode, the server automatically detects the start and end of speech and responds accordingly. This mode is suitable for voice call scenarios.
The interaction flow is as follows:
The server detects the start of speech and sends the input_audio_buffer.speech_started event.
The client can send input_audio_buffer.append and input_image_buffer.append events at any time to append audio and images to the buffer.
Before sending an input_image_buffer.append event, you must send at least one input_audio_buffer.append event.
The server detects the end of speech and sends the input_audio_buffer.speech_stopped event.
The server sends the input_audio_buffer.committed event to commit the audio buffer.
The server sends a conversation.item.created event, which contains the user message item created from the buffer.
Lifecycle | Client events | Server events |
Session initialization | Session configuration | Session created Session configuration updated |
User audio input | Add audio to the buffer Add an image to the buffer | input_audio_buffer.speech_started Speech start detected input_audio_buffer.speech_stopped Speech end detected Server received the submitted audio |
Server audio output | None | Server starts generating a response New output content during response Conversation item created New output content added to the assistant message response.audio_transcript.delta Incrementally generated transcribed text Incrementally generated audio from the model response.audio_transcript.done Text transcription complete Audio generation complete Streaming of text or audio content for the assistant message is complete Streaming of the entire output item for the assistant message is complete Response complete |
Manual mode
Set session.turn_detection in the session.update event to null to enable Manual mode. In this mode, the client requests a server response by explicitly sending the input_audio_buffer.commit and response.create events. This mode is suitable for push-to-talk scenarios, such as sending voice messages in chat applications.
The interaction flow is as follows:
The client can send input_audio_buffer.append and input_image_buffer.append events at any time to append audio and images to the buffer.
Before sending an input_image_buffer.append event, you must send at least one input_audio_buffer.append event.
The client sends the input_audio_buffer.commit event to commit the audio and image buffers. This informs the server that all user input, including audio and images, for the current turn has been sent.
The server responds with an input_audio_buffer.committed event.
The client sends a response.create event and waits for the model's output from the server.
The server responds with a conversation.item.created event.
Lifecycle | Client events | Server events |
Session initialization | Session configuration | Session created Session configuration updated |
User audio input | Add audio to the buffer Add an image to the buffer Submit audio and images to the server Create a model response | Server received the submitted audio |
Server audio output | Clear the audio from the buffer | Server starts generating a response New output content during response Conversation item created New output content added to the assistant message item response.audio_transcript.delta Incrementally generated transcribed text Incrementally generated audio from the model response.audio_transcript.done Text transcription complete Audio generation complete Streaming of text or audio content for the assistant message is complete Streaming of the entire output item for the assistant message is complete Response complete |
API reference
Billing and rate limiting
Billing rules
Qwen-Omni-Realtime is billed based on the number of tokens used for different input modalities, such as audio and images. For more information about billing, see Models.
Throttling
For more information about model throttling rules, see Throttling.
Error codes
If the model call fails and returns an error message, see Error messages for resolution.
Voice list
Set the voice request parameter to the value in the voice parameter column.qwen3-omni-flash-realtime-2025-12-01
Voice name |
| Voice effect | Description | Languages supported |
Cherry | Cherry | A sunny, positive, friendly, and natural young woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Serena | Serena | A gentle young woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Chelsie | Chelsie | A two-dimensional virtual girlfriend | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Momo | Momo | Playful and mischievous, cheering you up | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Vivian | Vivian | Confident, cute, and slightly feisty | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Moon | Moon | Effortlessly cool Moon White | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Maia | Maia | A blend of intellect and gentleness | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Kai | Kai | A soothing audio spa for your ears | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nofish | Nofish | A designer who cannot pronounce retroflex sounds | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bella | Bella | A little girl who drinks but never throws punches when drunk | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Jennifer | Jennifer | A premium, cinematic-quality American English female voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ryan | Ryan | Full of rhythm, bursting with dramatic flair, balancing authenticity and tension | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Katerina | Katerina | A mature-woman voice with rich, memorable rhythm | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Aiden | Aiden | An American English young man skilled in cooking | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Eldric Sage | Eldric Sage | A calm and wise elder—weathered like a pine tree, yet clear-minded as a mirror | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Mia | Mia | Gentle as spring water, obedient as fresh snow | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Mochi | Mochi | A clever, quick-witted young adult—childlike innocence remains, yet wisdom shines through | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bellona | Bellona | A powerful, clear voice that brings characters to life—so stirring it makes your blood boil. With heroic grandeur and perfect diction, this voice captures the full spectrum of human expression. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Vincent | Vincent | A uniquely raspy, smoky voice—just one line evokes armies and heroic tales | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bunny | Bunny | A little girl overflowing with "cuteness" | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Neil | Neil | A flat baseline intonation with precise, clear pronunciation—the most professional news anchor | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Elias | Elias | Maintains academic rigor while using storytelling techniques to turn complex knowledge into digestible learning modules | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Arthur | Arthur | A simple, earthy voice steeped in time and tobacco smoke—slowly unfolding village stories and curiosities | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nini | Nini | A soft, clingy voice like sweet rice cakes—those drawn-out calls of “Big Brother” are so sweet they melt your bones | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ebona | Ebona | Her whisper is like a rusty key slowly turning in the darkest corner of your mind—where childhood shadows and unknown fears hide | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Seren | Seren | A gentle, soothing voice to help you fall asleep faster. Good night, sweet dreams | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Pip | Pip | A playful, mischievous boy full of childlike wonder—is this your memory of Shin-chan? | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Stella | Stella | Normally a cloyingly sweet, dazed teenage-girl voice—but when shouting “I represent the moon to defeat you!”, she instantly radiates unwavering love and justice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Bodega | Bodega | A passionate Spanish man | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sonrisa | Sonrisa | A cheerful, outgoing Latin American woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Alek | Alek | Cold like the Russian spirit, yet warm like wool coat lining | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Dolce | Dolce | A laid-back Italian man | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sohee | Sohee | A warm, cheerful, emotionally expressive Korean unnie | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ono Anna | Ono Anna | A clever, spirited childhood friend | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Lenn | Lenn | Rational at heart, rebellious in detail—a German youth who wears suits and listens to post-punk | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Emilien | Emilien | A romantic French big brother | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Andre | Andre | A magnetic, natural, and steady male voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Radio Gol | Radio Gol | Football poet Radio Gol! Today I’ll commentate on football using my name. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shanghai - Jada | Jada | A fast-paced, energetic Shanghai auntie | Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Beijing - Dylan | Dylan | A young man raised in Beijing’s hutongs | Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nanjing - Li | Li | A patient yoga teacher | Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shaanxi - Marcus | Marcus | Broad face, few words, sincere heart, deep voice—the authentic Shaanxi flavor | Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Southern Min - Roy | Roy | A humorous, straightforward, lively Taiwanese guy | Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Tianjin - Peter | Peter | Tianjin-style crosstalk, professional foil | Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan - Sunny | Sunny | A Sichuan girl sweet enough to melt your heart | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan - Eric | Eric | A Sichuanese man from Chengdu who stands out in everyday life | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese - Rocky | Rocky | A humorous, witty A Qiang providing live chat | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese - Kiki | Kiki | A sweet Hong Kong girl best friend | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
qwen3-omni-flash-realtime, qwen3-omni-flash-realtime-2025-09-15
Voice name |
| Voice effect | Description | Languages supported |
Cherry | Cherry | A sunny, positive, friendly, and natural young woman | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nofish | Nofish | A designer who cannot pronounce retroflex sounds | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Jennifer | Jennifer | A premium, cinematic-quality American English female voice | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Ryan | Ryan | Full of rhythm, bursting with dramatic flair, balancing authenticity and tension | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Katerina | Katerina | A mature-woman voice with rich, memorable rhythm | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Elias | Elias | Maintains academic rigor while using storytelling techniques to turn complex knowledge into digestible learning modules | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shanghai - Jada | Jada | A fast-paced, energetic Shanghai auntie | Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Beijing - Dylan | Dylan | A young man raised in Beijing’s hutongs | Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan - Sunny | Sunny | A Sichuan girl sweet enough to melt your heart | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Nanjing - Li | Li | A patient yoga teacher | Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Shaanxi - Marcus | Marcus | Broad face, few words, sincere heart, deep voice—the authentic Shaanxi flavor | Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Southern Min - Roy | Roy | A humorous, straightforward, lively Taiwanese guy | Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Tianjin - Peter | Peter | Tianjin-style crosstalk, professional foil | Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese - Rocky | Rocky | A humorous, witty A Qiang providing live chat | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Cantonese - Kiki | Kiki | A sweet Hong Kong girl best friend | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | |
Sichuan - Eric | Eric | A Sichuanese man from Chengdu who stands out in everyday life | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
Qwen-Omni-Turbo-Realtime
Voice name |
| Voice Effect | Description | Languages supported |
Cherry | Cherry | A sunny, positive, friendly, and natural young woman | Chinese, English | |
Serena | Serena | A gentle young woman | Chinese, English | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant | Chinese, English | |
Chelsie | Chelsie | A two-dimensional virtual girlfriend | Chinese, English |