All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen-Omni-Realtime

Last Updated:Mar 31, 2026

Qwen-Omni-Realtime adalah model obrolan audio dan video real-time dalam seri Qwen yang memproses input streaming berupa audio dan gambar—seperti frame gambar berkelanjutan yang diekstraksi secara real-time dari aliran video—serta menghasilkan output teks dan audio berkualitas tinggi secara real-time.

Wilayah yang didukung: Singapura, Beijing. Anda harus menggunakan API key untuk setiap wilayah.

Cara menggunakan

1. Membuat koneksi

Hubungkan ke model Qwen-Omni-Realtime menggunakan protokol WebSocket. Anda dapat menggunakan contoh kode Python di bawah ini atau SDK DashScope.

Catatan

Satu sesi WebSocket berlangsung hingga 120 menit sebelum ditutup secara otomatis.

Koneksi WebSocket native

Anda memerlukan item konfigurasi berikut:

Item konfigurasi

Deskripsi

Endpoint

China (Beijing): wss://dashscope.aliyuncs.com/api-ws/v1/realtime

Internasional (Singapura): wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime

Parameter kueri

Parameter kueri adalah model. Atur ke nama model yang ingin Anda akses. Contoh: ?model=qwen3.5-omni-plus-realtime

Header permintaan

Otentikasi dengan Bearer Token: Authorization: Bearer DASHSCOPE_API_KEY

DASHSCOPE_API_KEY adalah API key yang Anda minta di Model Studio.
# pip install websocket-client
import json
import websocket
import os

API_KEY=os.getenv("DASHSCOPE_API_KEY")
API_URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3.5-omni-plus-realtime"

headers = [
    "Authorization: Bearer " + API_KEY
]

def on_open(ws):
    print(f"Connected to server: {API_URL}")
def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))
def on_error(ws, error):
    print("Error:", error)

ws = websocket.WebSocketApp(
    API_URL,
    header=headers,
    on_open=on_open,
    on_message=on_message,
    on_error=on_error
)

ws.run_forever()

DashScope SDK

# SDK versi 1.23.9 atau lebih baru
import os
import json
from dashscope.audio.qwen_omni import OmniRealtimeConversation,OmniRealtimeCallback
import dashscope
# API key untuk Singapura dan Beijing berbeda. Untuk mendapatkan API key, lihat https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Jika Anda belum mengonfigurasi API key, ganti baris berikut dengan dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

class PrintCallback(OmniRealtimeCallback):
    def on_open(self) -> None:
        print("Connected Successfully")
    def on_event(self, response: dict) -> None:
        print("Received event:")
        print(json.dumps(response, indent=2, ensure_ascii=False))
    def on_close(self, close_status_code: int, close_msg: str) -> None:
        print(f"Connection closed (code={close_status_code}, msg={close_msg}).")

callback = PrintCallback()
conversation = OmniRealtimeConversation(
    model="qwen3.5-omni-plus-realtime",
    callback=callback,
    # Berikut adalah URL untuk wilayah Singapura. Jika Anda menggunakan model di wilayah Beijing, ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime
    url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
)
try:
    conversation.connect()
    print("Conversation started. Press Ctrl+C to exit.")
    conversation.thread.join()
except KeyboardInterrupt:
    conversation.close()
// SDK versi 2.20.9 atau lebih baru
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import java.util.concurrent.CountDownLatch;

public class Main {
    public static void main(String[] args) throws InterruptedException, NoApiKeyException {
        CountDownLatch latch = new CountDownLatch(1);
        OmniRealtimeParam param = OmniRealtimeParam.builder()
                .model("qwen3.5-omni-plus-realtime")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                // Berikut adalah URL untuk wilayah Singapura. Jika Anda menggunakan model di wilayah Beijing, ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                .build();

        OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
            @Override
            public void onOpen() {
                System.out.println("Connected Successfully");
            }
            @Override
            public void onEvent(JsonObject message) {
                System.out.println(message);
            }
            @Override
            public void onClose(int code, String reason) {
                System.out.println("connection closed code: " + code + ", reason: " + reason);
                latch.countDown();
            }
        });
        conversation.connect();
        latch.await();
        conversation.close(1000, "bye");
        System.exit(0);
    }
}

2. Mengonfigurasi sesi

Kirim event client session.update:

{
    // ID event ini, di-generate oleh client.
    "event_id": "event_ToPZqeobitzUJnt3QqtWg",
    // Jenis event. Nilainya tetap session.update.
    "type": "session.update",
    // Konfigurasi sesi.
    "session": {
        // Modalitas output. Nilai yang didukung adalah ["text"] (hanya teks) atau ["text","audio"] (teks dan audio).
        "modalities": [
            "text",
            "audio"
        ],
        // Suara untuk output audio.
        "voice": "Cherry",
        // Format audio input. Hanya pcm yang didukung.
        "input_audio_format": "pcm",
        // Format audio output. Hanya pcm yang didukung.
        "output_audio_format": "pcm",
        // Pesan sistem. Menetapkan tujuan atau peran model.
        "instructions": "You are an AI customer service agent for a five-star hotel. Answer customer inquiries about room types, facilities, prices, and booking policies accurately and friendly. Always respond with a professional and helpful attitude. Do not provide unconfirmed information or information beyond the scope of the hotel's services.",
        // Mengaktifkan deteksi aktivitas suara. Untuk mengaktifkannya, berikan objek konfigurasi. Server akan secara otomatis mendeteksi awal/akhir ucapan berdasarkan objek ini.
        // Atur ke null agar client yang menentukan kapan memulai respons model.
        "turn_detection": {
            // Jenis VAD. Harus diatur ke server_vad.
            "type": "server_vad",
            // Ambang batas deteksi VAD. Naikkan di lingkungan berisik dan turunkan di lingkungan tenang.
            "threshold": 0.5,
            // Durasi diam untuk mendeteksi akhir ucapan. Respons model dipicu jika nilai ini terlampaui.
            "silence_duration_ms": 800
        }
    }
}

3. Input audio dan gambar

Kirim data audio (wajib) dan gambar (opsional) yang telah di-encode Base64 ke buffer server menggunakan event input_audio_buffer.append dan input_image_buffer.append.

Gambar dapat berasal dari file lokal atau diambil secara real-time dari aliran video.
Jika VAD sisi server diaktifkan, server secara otomatis mengirimkan data dan memicu respons saat ucapan berakhir. Jika VAD dinonaktifkan (mode manual), client harus memanggil event input_audio_buffer.commit untuk mengirimkan data.

4. Menerima respons model

Format respons model bergantung pada modalitas output yang dikonfigurasi.

Pemilihan model

Qwen3.5-Omni-Realtime adalah model multimodal real-time terbaru dalam seri Qwen. Dibandingkan dengan model generasi sebelumnya Qwen3-Omni-Flash-Realtime, model ini menawarkan:

  • Tingkat kecerdasan

    Kecerdasan yang meningkat signifikan, setara dengan Qwen3.5-Plus.

  • Pencarian web

    Mendukung pencarian web secara native. Model secara otonom memutuskan apakah perlu mencari jawaban untuk pertanyaan real-time. Untuk detailnya, lihat Pencarian web.

  • Interrupsi semantik

    Secara otomatis mengidentifikasi maksud percakapan untuk menghindari interupsi dari suara pengisi dan kebisingan latar belakang yang tidak berarti.

  • Kontrol suara

    Kontrol volume, laju berbicara, dan emosi menggunakan perintah suara seperti “berbicara lebih cepat,” “berbicara lebih keras,” atau “berbicara dengan ceria.”

  • Bahasa yang didukung

    Mendukung pengenalan ucapan dalam 113 bahasa dan dialek, serta sintesis suara dalam 36 bahasa dan dialek.

  • Pilihan suara

    Menawarkan 55 suara (47 multilingual + 8 spesifik dialek). Lihat Daftar suara.

Lihat Daftar model untuk nama model, konteks, harga, dan versi snapshot. Untuk pembatasan konkurensi, lihat Pembatasan kecepatan.

Memulai

Dapatkan API key dan konfigurasikan API key sebagai variabel lingkungan.

Pilih bahasa pemrograman pilihan Anda dan ikuti langkah-langkah berikut untuk segera memulai percakapan real-time dengan model Realtime.

DashScope Python SDK

  • Persiapkan lingkungan runtime

Versi Python Anda harus 3.10 atau lebih baru.

Pertama, instal pyaudio berdasarkan sistem operasi Anda.

macOS

brew install portaudio && pip install pyaudio

Debian/Ubuntu

  • Jika Anda tidak menggunakan virtual environment, instal langsung menggunakan package manager sistem:

    sudo apt-get install python3-pyaudio
  • Jika Anda menggunakan virtual environment, pertama instal dependensi kompilasi:

    sudo apt update
    sudo apt install -y python3-dev portaudio19-dev

    Kemudian, Anda dapat menginstal paket menggunakan pip di virtual environment yang diaktifkan.

    pip install pyaudio

CentOS

sudo yum install -y portaudio portaudio-devel && pip install pyaudio

Windows

pip install pyaudio

Setelah instalasi, instal dependensi menggunakan pip:

pip install websocket-client dashscope
  • Pilih mode interaksi

    • Mode VAD (Deteksi aktivitas suara, deteksi otomatis awal/akhir ucapan)

      Server secara otomatis mendeteksi kapan pengguna mulai dan berhenti berbicara serta memberikan respons.

    • Mode manual (press-to-talk, release-to-send)

      Client mengontrol waktu bicara. Setelah pengguna selesai berbicara, client mengirim pesan ke server.

    Mode VAD

    Buat file Python baru bernama vad_dash.py dan salin kode berikut ke dalamnya:

    vad_dash.py

    # Dependensi: dashscope >= 1.23.9, pyaudio
    import os
    import base64
    import time
    import pyaudio
    from dashscope.audio.qwen_omni import MultiModality, AudioFormat,OmniRealtimeCallback,OmniRealtimeConversation
    import dashscope
    
    # Parameter konfigurasi: endpoint, API key, voice, model, peran model
    # Tentukan wilayah. Atur ke 'intl' untuk Internasional (Singapura) atau 'cn' untuk China (Beijing).
    region = 'intl'
    base_domain = 'dashscope-intl.aliyuncs.com' if region == 'intl' else 'dashscope.aliyuncs.com'
    url = f'wss://{base_domain}/api-ws/v1/realtime'
    # Konfigurasikan API key. Jika Anda belum menyetel variabel lingkungan, ganti baris berikut dengan dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv('DASHSCOPE_API_KEY')
    # Tentukan suara
    voice = 'Cherry'
    # Tentukan model
    model = 'qwen3.5-omni-plus-realtime'
    # Tentukan peran model
    instructions = "You are Xiaoyun, a personal assistant. Please answer the user's questions in a humorous and witty way."
    class SimpleCallback(OmniRealtimeCallback):
        def __init__(self, pya):
            self.pya = pya
            self.out = None
        def on_open(self):
            # Inisialisasi aliran output audio
            self.out = self.pya.open(
                format=pyaudio.paInt16,
                channels=1,
                rate=24000,
                output=True
            )
        def on_event(self, response):
            if response['type'] == 'response.audio.delta':
                # Putar audio
                self.out.write(base64.b64decode(response['delta']))
            elif response['type'] == 'conversation.item.input_audio_transcription.completed':
                # Cetak teks hasil transkripsi
                print(f"[User] {response['transcript']}")
            elif response['type'] == 'response.audio_transcript.done':
                # Cetak teks balasan asisten
                print(f"[LLM] {response['transcript']}")
    
    # 1. Inisialisasi perangkat audio
    pya = pyaudio.PyAudio()
    # 2. Buat fungsi callback dan sesi
    callback = SimpleCallback(pya)
    conv = OmniRealtimeConversation(model=model, callback=callback, url=url)
    # 3. Bangun koneksi dan konfigurasikan sesi
    conv.connect()
    conv.update_session(output_modalities=[MultiModality.AUDIO, MultiModality.TEXT], voice=voice, instructions=instructions)
    # 4. Inisialisasi aliran input audio
    mic = pya.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
    # 5. Loop utama untuk memproses input audio
    print("Conversation started. Speak into the microphone (Ctrl+C to exit)...")
    try:
        while True:
            audio_data = mic.read(3200, exception_on_overflow=False)
            conv.append_audio(base64.b64encode(audio_data).decode())
            time.sleep(0.01)
    except KeyboardInterrupt:
        # Bersihkan resource
        conv.close()
        mic.close()
        callback.out.close()
        pya.terminate()
        print("\nConversation ended")

    Jalankan vad_dash.py untuk melakukan percakapan real-time dengan Qwen-Omni-Realtime melalui mikrofon Anda. Sistem mendeteksi awal/akhir ucapan dan secara otomatis mengirim data ke server tanpa intervensi manual.

    Mode manual

    Buat file Python baru bernama manual_dash.py dan salin kode berikut ke dalamnya:

    manual_dash.py

    # Dependensi: dashscope >= 1.23.9, pyaudio.
    import os
    import base64
    import sys
    import threading
    import pyaudio
    from dashscope.audio.qwen_omni import *
    import dashscope
    
    # Jika Anda belum menyetel variabel lingkungan, ganti baris berikut dengan API key Anda: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv('DASHSCOPE_API_KEY')
    voice = 'Cherry'
    
    class MyCallback(OmniRealtimeCallback):
        """Callback minimal: Menginisialisasi speaker setelah koneksi dan langsung memutar audio yang dikembalikan."""
        def __init__(self, ctx):
            super().__init__()
            self.ctx = ctx
    
        def on_open(self) -> None:
            # Inisialisasi PyAudio dan speaker (24k/mono/16bit) setelah koneksi.
            print('connection opened')
            try:
                self.ctx['pya'] = pyaudio.PyAudio()
                self.ctx['out'] = self.ctx['pya'].open(
                    format=pyaudio.paInt16,
                    channels=1,
                    rate=24000,
                    output=True
                )
                print('audio output initialized')
            except Exception as e:
                print('[Error] audio init failed: {}'.format(e))
    
        def on_close(self, close_status_code, close_msg) -> None:
            print('connection closed with code: {}, msg: {}'.format(close_status_code, close_msg))
            sys.exit(0)
    
        def on_event(self, response: str) -> None:
            try:
                t = response['type']
                handlers = {
                    'session.created': lambda r: print('start session: {}'.format(r['session']['id'])),
                    'conversation.item.input_audio_transcription.completed': lambda r: print('question: {}'.format(r['transcript'])),
                    'response.audio_transcript.delta': lambda r: print('llm text: {}'.format(r['delta'])),
                    'response.audio.delta': self._play_audio,
                    'response.done': self._response_done,
                }
                h = handlers.get(t)
                if h:
                    h(response)
            except Exception as e:
                print('[Error] {}'.format(e))
    
        def _play_audio(self, response):
            # Decode Base64 dan tulis ke aliran output untuk diputar.
            if self.ctx['out'] is None:
                return
            try:
                data = base64.b64decode(response['delta'])
                self.ctx['out'].write(data)
            except Exception as e:
                print('[Error] audio playback failed: {}'.format(e))
    
        def _response_done(self, response):
            # Tandai putaran saat ini selesai agar loop utama menunggu.
            if self.ctx['conv'] is not None:
                print('[Metric] response: {}, first text delay: {}, first audio delay: {}'.format(
                    self.ctx['conv'].get_last_response_id(),
                    self.ctx['conv'].get_last_first_text_delay(),
                    self.ctx['conv'].get_last_first_audio_delay(),
                ))
            if self.ctx['resp_done'] is not None:
                self.ctx['resp_done'].set()
    
    def shutdown_ctx(ctx):
        """Lepaskan resource audio dan PyAudio secara aman."""
        try:
            if ctx['out'] is not None:
                ctx['out'].close()
                ctx['out'] = None
        except Exception:
            pass
        try:
            if ctx['pya'] is not None:
                ctx['pya'].terminate()
                ctx['pya'] = None
        except Exception:
            pass
    
    
    def record_until_enter(pya_inst: pyaudio.PyAudio, sample_rate=16000, chunk_size=3200):
        """Tekan Enter untuk menghentikan perekaman dan mengembalikan byte PCM."""
        frames = []
        stop_evt = threading.Event()
    
        stream = pya_inst.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=sample_rate,
            input=True,
            frames_per_buffer=chunk_size
        )
    
        def _reader():
            while not stop_evt.is_set():
                try:
                    frames.append(stream.read(chunk_size, exception_on_overflow=False))
                except Exception:
                    break
    
        t = threading.Thread(target=_reader, daemon=True)
        t.start()
        input()  # Pengguna menekan Enter lagi untuk menghentikan perekaman.
        stop_evt.set()
        t.join(timeout=1.0)
        try:
            stream.close()
        except Exception:
            pass
        return b''.join(frames)
    
    
    if __name__  == '__main__':
        print('Initializing ...')
        # Konteks runtime: Menyimpan handle audio dan sesi.
        ctx = {'pya': None, 'out': None, 'conv': None, 'resp_done': threading.Event()}
        callback = MyCallback(ctx)
        conversation = OmniRealtimeConversation(
            model='qwen3.5-omni-plus-realtime',
            callback=callback,
            # Berikut adalah URL untuk wilayah Internasional (Singapura). Jika Anda menggunakan model di China (Beijing), ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime",
        )
        try:
            conversation.connect()
        except Exception as e:
            print('[Error] connect failed: {}'.format(e))
            sys.exit(1)
    
        ctx['conv'] = conversation
        # Konfigurasi sesi: Aktifkan output teks dan audio (nonaktifkan VAD sisi server, beralih ke perekaman manual).
        conversation.update_session(
            output_modalities=[MultiModality.AUDIO, MultiModality.TEXT],
            voice=voice,
            enable_input_audio_transcription=True,
            # Model untuk mentranskripsi audio input. Hanya gummy-realtime-v1 yang didukung.
            input_audio_transcription_model='gummy-realtime-v1',
            enable_turn_detection=False,
            instructions="You are Xiaoyun, a personal assistant. Please answer the user's questions accurately and friendly, always responding with a helpful attitude."
        )
    
        try:
            turn = 1
            while True:
                print(f"\n--- Turn {turn} ---")
                print("Press Enter to start recording (enter q to exit)...")
                user_input = input()
                if user_input.strip().lower() in ['q', 'quit']:
                    print("User requested to exit...")
                    break
                print("Recording... Press Enter again to stop.")
                if ctx['pya'] is None:
                    ctx['pya'] = pyaudio.PyAudio()
                recorded = record_until_enter(ctx['pya'])
                if not recorded:
                    print("No valid audio was recorded. Please try again.")
                    continue
                print(f"Successfully recorded audio: {len(recorded)} bytes. Sending...")
    
                # Kirim dalam potongan 3200-byte (sesuai dengan 16k/16bit/100ms).
                chunk_size = 3200
                for i in range(0, len(recorded), chunk_size):
                    chunk = recorded[i:i+chunk_size]
                    conversation.append_audio(base64.b64encode(chunk).decode('ascii'))
    
                print("Sending complete. Waiting for model response...")
                ctx['resp_done'].clear()
                conversation.commit()
                conversation.create_response()
                ctx['resp_done'].wait()
                print('Audio playback complete.')
                turn += 1
        except KeyboardInterrupt:
            print("\nProgram interrupted by user.")
        finally:
            shutdown_ctx(ctx)
            print("Program exited.")

    Jalankan manual_dash.py. Tekan Enter untuk berbicara, lalu tekan Enter lagi untuk menerima respons audio model.

DashScope Java SDK

Pilih mode interaksi

  • Mode VAD (Deteksi aktivitas suara, deteksi otomatis awal/akhir ucapan)

    API Realtime secara otomatis mendeteksi waktu bicara dan memberikan respons.

  • Mode manual (press-to-talk, release-to-send)

    Client mengontrol waktu bicara. Setelah pengguna selesai berbicara, client mengirim pesan ke server.

Mode VAD

OmniServerVad.java

import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.nio.ByteBuffer;
import java.util.Arrays;
import java.util.Base64;
import java.util.Map;
import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class OmniServerVad {
    static class SequentialAudioPlayer {
        private final SourceDataLine line;
        private final Queue<byte[]> audioQueue = new ConcurrentLinkedQueue<>();
        private final Thread playerThread;
        private final AtomicBoolean shouldStop = new AtomicBoolean(false);

        public SequentialAudioPlayer() throws LineUnavailableException {
            AudioFormat format = new AudioFormat(24000, 16, 1, true, false);
            line = AudioSystem.getSourceDataLine(format);
            line.open(format);
            line.start();

            playerThread = new Thread(() -> {
                while (!shouldStop.get()) {
                    byte[] audio = audioQueue.poll();
                    if (audio != null) {
                        line.write(audio, 0, audio.length);
                    } else {
                        try { Thread.sleep(10); } catch (InterruptedException ignored) {}
                    }
                }
            }, "AudioPlayer");
            playerThread.start();
        }

        public void play(String base64Audio) {
            try {
                byte[] audio = Base64.getDecoder().decode(base64Audio);
                audioQueue.add(audio);
            } catch (Exception e) {
                System.err.println("Audio decoding failed: " + e.getMessage());
            }
        }

        public void cancel() {
            audioQueue.clear();
            line.flush();
        }

        public void close() {
            shouldStop.set(true);
            try { playerThread.join(1000); } catch (InterruptedException ignored) {}
            line.drain();
            line.close();
        }
    }

    public static void main(String[] args) {
        try {
            SequentialAudioPlayer player = new SequentialAudioPlayer();
            AtomicBoolean userIsSpeaking = new AtomicBoolean(false);
            AtomicBoolean shouldStop = new AtomicBoolean(false);

            OmniRealtimeParam param = OmniRealtimeParam.builder()
                    .model("qwen3.5-omni-plus-realtime")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    // Berikut adalah URL untuk wilayah Internasional (Singapura). Jika Anda menggunakan model di China (Beijing), ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    .build();

            OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
                @Override public void onOpen() {
                    System.out.println("Connection established");
                }
                @Override public void onClose(int code, String reason) {
                    System.out.println("Connection closed (" + code + "): " + reason);
                    shouldStop.set(true);
                }
                @Override public void onEvent(JsonObject event) {
                    handleEvent(event, player, userIsSpeaking);
                }
            });

            conversation.connect();
            conversation.updateSession(OmniRealtimeConfig.builder()
                    .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
                    .voice("Cherry")
                    .enableTurnDetection(true)
                    .enableInputAudioTranscription(true)
                    .parameters(Map.of("instructions",
                            "You are an AI customer service agent for a five-star hotel. Answer customer inquiries about room types, facilities, prices, and booking policies accurately and friendly. Always respond with a professional and helpful attitude. Do not provide unconfirmed information or information beyond the scope of the hotel's services."))
                    .build()
            );

            System.out.println("Please start speaking (automatic detection of speech start/end, press Ctrl+C to exit)...");
            AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
            TargetDataLine mic = AudioSystem.getTargetDataLine(format);
            mic.open(format);
            mic.start();

            ByteBuffer buffer = ByteBuffer.allocate(3200);
            while (!shouldStop.get()) {
                int bytesRead = mic.read(buffer.array(), 0, buffer.capacity());
                if (bytesRead > 0) {
                    try {
                        conversation.appendAudio(Base64.getEncoder().encodeToString(buffer.array()));
                    } catch (Exception e) {
                        if (e.getMessage() != null && e.getMessage().contains("closed")) {
                            System.out.println("Conversation closed. Stopping recording.");
                            break;
                        }
                    }
                }
                Thread.sleep(20);
            }

            conversation.close(1000, "Normal exit");
            player.close();
            mic.close();
            System.out.println("\nProgram exited.");

        } catch (NoApiKeyException e) {
            System.err.println("API KEY not found: Please set the DASHSCOPE_API_KEY environment variable.");
            System.exit(1);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static void handleEvent(JsonObject event, SequentialAudioPlayer player, AtomicBoolean userIsSpeaking) {
        String type = event.get("type").getAsString();
        switch (type) {
            case "input_audio_buffer.speech_started":
                System.out.println("\n[User started speaking]");
                player.cancel();
                userIsSpeaking.set(true);
                break;
            case "input_audio_buffer.speech_stopped":
                System.out.println("[User stopped speaking]");
                userIsSpeaking.set(false);
                break;
            case "response.audio.delta":
                if (!userIsSpeaking.get()) {
                    player.play(event.get("delta").getAsString());
                }
                break;
            case "conversation.item.input_audio_transcription.completed":
                System.out.println("User: " + event.get("transcript").getAsString());
                break;
            case "response.audio_transcript.delta":
                System.out.print(event.get("delta").getAsString());
                break;
            case "response.done":
                System.out.println("Response complete");
                break;
        }
    }
}

Jalankan metode OmniServerVad.main() untuk melakukan percakapan real-time dengan model Realtime melalui mikrofon Anda. Sistem mendeteksi awal/akhir ucapan dan secara otomatis mengirim data ke server tanpa intervensi manual.

Mode manual

OmniWithoutServerVad.java

// DashScope Java SDK versi 2.20.9 atau lebih baru

import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.Arrays;
import java.util.Base64;
import java.util.HashMap;
import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicReference;

public class Main {
    // Definisi kelas RealtimePcmPlayer dimulai
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // Konstruktor menginisialisasi format audio dan jalur audio.
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Memutar potongan audio dan memblokir hingga pemutaran selesai.
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Tunggu hingga audio dalam buffer selesai diputar.
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    } // Definisi kelas RealtimePcmPlayer berakhir
    // Tambahkan metode perekaman
    private static void recordAndSend(TargetDataLine line, OmniRealtimeConversation conversation) throws IOException {
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        byte[] buffer = new byte[3200];
        AtomicBoolean stopRecording = new AtomicBoolean(false);

        // Mulai thread untuk mendengarkan tombol Enter.
        Thread enterKeyListener = new Thread(() -> {
            try {
                System.in.read();
                stopRecording.set(true);
            } catch (IOException e) {
                e.printStackTrace();
            }
        });
        enterKeyListener.start();

        // Loop perekaman
        while (!stopRecording.get()) {
            int count = line.read(buffer, 0, buffer.length);
            if (count > 0) {
                out.write(buffer, 0, count);
            }
        }

        // Kirim data yang direkam.
        byte[] audioData = out.toByteArray();
        String audioB64 = Base64.getEncoder().encodeToString(audioData);
        conversation.appendAudio(audioB64);
        out.close();
    }

    public static void main(String[] args) throws InterruptedException, LineUnavailableException {
        OmniRealtimeParam param = OmniRealtimeParam.builder()
                .model("qwen3.5-omni-plus-realtime")
                // API key untuk Singapura dan Beijing berbeda. Untuk mendapatkan API key, lihat https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan API key Model Studio Anda: .apikey("sk-xxx")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                //Berikut adalah URL untuk wilayah Singapura. Jika Anda menggunakan model di wilayah Beijing, ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                .build();
        AtomicReference<CountDownLatch> responseDoneLatch = new AtomicReference<>(null);
        responseDoneLatch.set(new CountDownLatch(1));

        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
        final AtomicReference<OmniRealtimeConversation> conversationRef = new AtomicReference<>(null);
        OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
            @Override
            public void onOpen() {
                System.out.println("connection opened");
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
                        break;
                    case "conversation.item.input_audio_transcription.completed":
                        System.out.println("question: " + message.get("transcript").getAsString());
                        break;
                    case "response.audio_transcript.delta":
                        System.out.println("got llm response delta: " + message.get("delta").getAsString());
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        System.out.println("======RESPONSE DONE======");
                        if (conversationRef.get() != null) {
                            System.out.println("[Metric] response: " + conversationRef.get().getResponseId() +
                                    ", first text delay: " + conversationRef.get().getFirstTextDelay() +
                                    " ms, first audio delay: " + conversationRef.get().getFirstAudioDelay() + " ms");
                        }
                        responseDoneLatch.get().countDown();
                        break;
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                System.out.println("connection closed code: " + code + ", reason: " + reason);
            }
        });
        conversationRef.set(conversation);
        try {
            conversation.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        OmniRealtimeConfig config = OmniRealtimeConfig.builder()
                .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
                .voice("Cherry")
                .enableTurnDetection(false)
                // Tetapkan peran model.
                .parameters(new HashMap<String, Object>() {{
                    put("instructions","You are Xiaoyun, a personal assistant. Please answer the user's questions accurately and friendly, always responding with a helpful attitude.");
                }})
                .build();
        conversation.updateSession(config);

        // Tambahkan fungsionalitas perekaman mikrofon.
        AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
        DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

        if (!AudioSystem.isLineSupported(info)) {
            System.out.println("Line not supported");
            return;
        }

        TargetDataLine line = null;
        try {
            line = (TargetDataLine) AudioSystem.getLine(info);
            line.open(format);
            line.start();

            while (true) {
                System.out.println("Press Enter to start recording...");
                System.in.read();
                System.out.println("Recording started. Please speak... Press Enter again to stop recording and send.");
                recordAndSend(line, conversation);
                conversation.commit();
                conversation.createResponse(null, null);
                // Reset latch untuk menunggu berikutnya.
                responseDoneLatch.set(new CountDownLatch(1));
            }
        } catch (LineUnavailableException | IOException e) {
            e.printStackTrace();
        } finally {
            if (line != null) {
                line.stop();
                line.close();
            }
        }
    }}

Jalankan metode OmniWithoutServerVad.main(). Tekan Enter untuk mulai merekam. Selama perekaman, tekan Enter lagi untuk berhenti dan mengirim. Kemudian terima dan putar respons model.

WebSocket (Python)

  • Persiapkan lingkungan runtime

    Versi Python Anda harus 3.10 atau lebih baru.

    Pertama, instal pyaudio berdasarkan sistem operasi Anda.

    macOS

    brew install portaudio && pip install pyaudio

    Debian/Ubuntu

    sudo apt-get install python3-pyaudio
    
    or
    
    pip install pyaudio
    Kami merekomendasikan menggunakan pip install pyaudio. Jika instalasi gagal, pertama instal dependensi portaudio untuk OS Anda.

    CentOS

    sudo yum install -y portaudio portaudio-devel && pip install pyaudio

    Windows

    pip install pyaudio

    Setelah instalasi, instal dependensi terkait WebSocket menggunakan pip:

    pip install websockets==15.0.1
  • Buat client

    Buat file Python baru bernama omni_realtime_client.py di direktori lokal Anda dan salin kode berikut ke dalamnya:

    omni_realtime_client.py

    import asyncio
    import websockets
    import json
    import base64
    import time
    from typing import Optional, Callable, List, Dict, Any
    from enum import Enum
    
    class TurnDetectionMode(Enum):
        SERVER_VAD = "server_vad"
        MANUAL = "manual"
    
    class OmniRealtimeClient:
    
        def __init__(
                self,
                base_url,
                api_key: str,
                model: str = "",
                voice: str = "Ethan",
                instructions: str = "You are a helpful assistant.",
                turn_detection_mode: TurnDetectionMode = TurnDetectionMode.SERVER_VAD,
                on_text_delta: Optional[Callable[[str], None]] = None,
                on_audio_delta: Optional[Callable[[bytes], None]] = None,
                on_input_transcript: Optional[Callable[[str], None]] = None,
                on_output_transcript: Optional[Callable[[str], None]] = None,
                extra_event_handlers: Optional[Dict[str, Callable[[Dict[str, Any]], None]]] = None
        ):
            self.base_url = base_url
            self.api_key = api_key
            self.model = model
            self.voice = voice
            self.instructions = instructions
            self.ws = None
            self.on_text_delta = on_text_delta
            self.on_audio_delta = on_audio_delta
            self.on_input_transcript = on_input_transcript
            self.on_output_transcript = on_output_transcript
            self.turn_detection_mode = turn_detection_mode
            self.extra_event_handlers = extra_event_handlers or {}
    
            # Status respons saat ini
            self._current_response_id = None
            self._current_item_id = None
            self._is_responding = False
            # Status pencetakan transkrip input/output
            self._print_input_transcript = True
            self._output_transcript_buffer = ""
    
        async def connect(self) -> None:
            """Membangun koneksi WebSocket dengan API Realtime."""
            url = f"{self.base_url}?model={self.model}"
            headers = {
                "Authorization": f"Bearer {self.api_key}"
            }
            self.ws = await websockets.connect(url, additional_headers=headers)
    
            # Konfigurasi sesi
            session_config = {
                "modalities": ["text", "audio"],
                "voice": self.voice,
                "instructions": self.instructions,
                "input_audio_format": "pcm",
                "output_audio_format": "pcm",
                "input_audio_transcription": {
                    "model": "gummy-realtime-v1"
                }
            }
    
            if self.turn_detection_mode == TurnDetectionMode.MANUAL:
                session_config['turn_detection'] = None
                await self.update_session(session_config)
            elif self.turn_detection_mode == TurnDetectionMode.SERVER_VAD:
                session_config['turn_detection'] = {
                    "type": "server_vad",
                    "threshold": 0.1,
                    "prefix_padding_ms": 500,
                    "silence_duration_ms": 900
                }
                await self.update_session(session_config)
            else:
                raise ValueError(f"Invalid turn detection mode: {self.turn_detection_mode}")
    
        async def send_event(self, event) -> None:
            event['event_id'] = "event_" + str(int(time.time() * 1000))
            await self.ws.send(json.dumps(event))
    
        async def update_session(self, config: Dict[str, Any]) -> None:
            """Memperbarui konfigurasi sesi."""
            event = {
                "type": "session.update",
                "session": config
            }
            await self.send_event(event)
    
        async def stream_audio(self, audio_chunk: bytes) -> None:
            """Streaming data audio mentah ke API."""
            # Hanya PCM 16-bit, 16 kHz, mono yang didukung.
            audio_b64 = base64.b64encode(audio_chunk).decode()
            append_event = {
                "type": "input_audio_buffer.append",
                "audio": audio_b64
            }
            await self.send_event(append_event)
    
        async def commit_audio_buffer(self) -> None:
            """Commit buffer audio untuk memicu pemrosesan."""
            event = {
                "type": "input_audio_buffer.commit"
            }
            await self.send_event(event)
    
        async def append_image(self, image_chunk: bytes) -> None:
            """Menambahkan data gambar ke buffer gambar.
            Data gambar dapat berasal dari file lokal atau aliran video real-time.
            Catatan:
                - Format gambar harus JPG atau JPEG. Kami merekomendasikan resolusi 480p atau 720p. Resolusi maksimum yang didukung adalah 1080p.
                - Satu gambar tidak boleh melebihi 500 KB.
                - Encode data gambar ke Base64 sebelum mengirim.
                - Kami merekomendasikan mengirim gambar tidak lebih dari 1 frame per detik.
                - Anda harus mengirim data audio setidaknya sekali sebelum mengirim data gambar.
            """
            image_b64 = base64.b64encode(image_chunk).decode()
            event = {
                "type": "input_image_buffer.append",
                "image": image_b64
            }
            await self.send_event(event)
    
        async def create_response(self) -> None:
            """Meminta API untuk menghasilkan respons (hanya diperlukan dalam mode manual)."""
            event = {
                "type": "response.create"
            }
            await self.send_event(event)
    
        async def cancel_response(self) -> None:
            """Membatalkan respons saat ini."""
            event = {
                "type": "response.cancel"
            }
            await self.send_event(event)
    
        async def handle_interruption(self):
            """Menangani interupsi pengguna terhadap respons saat ini."""
            if not self._is_responding:
                return
            # 1. Batalkan respons saat ini.
            if self._current_response_id:
                await self.cancel_response()
    
            self._is_responding = False
            self._current_response_id = None
            self._current_item_id = None
    
        async def handle_messages(self) -> None:
            try:
                async for message in self.ws:
                    event = json.loads(message)
                    event_type = event.get("type")
                    if event_type == "error":
                        print(" Error: ", event['error'])
                        continue
                    elif event_type == "response.created":
                        self._current_response_id = event.get("response", {}).get("id")
                        self._is_responding = True
                    elif event_type == "response.output_item.added":
                        self._current_item_id = event.get("item", {}).get("id")
                    elif event_type == "response.done":
                        self._is_responding = False
                        self._current_response_id = None
                        self._current_item_id = None
                    elif event_type == "input_audio_buffer.speech_started":
                        print("Speech start detected")
                        if self._is_responding:
                            print("Handling interruption")
                            await self.handle_interruption()
                    elif event_type == "input_audio_buffer.speech_stopped":
                        print("Speech end detected")
                    elif event_type == "response.text.delta":
                        if self.on_text_delta:
                            self.on_text_delta(event["delta"])
                    elif event_type == "response.audio.delta":
                        if self.on_audio_delta:
                            audio_bytes = base64.b64decode(event["delta"])
                            self.on_audio_delta(audio_bytes)
                    elif event_type == "conversation.item.input_audio_transcription.completed":
                        transcript = event.get("transcript", "")
                        print(f"User: {transcript}")
                        if self.on_input_transcript:
                            await asyncio.to_thread(self.on_input_transcript, transcript)
                            self._print_input_transcript = True
                    elif event_type == "response.audio_transcript.delta":
                        if self.on_output_transcript:
                            delta = event.get("delta", "")
                            if not self._print_input_transcript:
                                self._output_transcript_buffer += delta
                            else:
                                if self._output_transcript_buffer:
                                    await asyncio.to_thread(self.on_output_transcript, self._output_transcript_buffer)
                                    self._output_transcript_buffer = ""
                                await asyncio.to_thread(self.on_output_transcript, delta)
                    elif event_type == "response.audio_transcript.done":
                        print(f"LLM: {event.get('transcript', '')}")
                        self._print_input_transcript = False
                    elif event_type in self.extra_event_handlers:
                        self.extra_event_handlers[event_type](event)
            except websockets.exceptions.ConnectionClosed:
                print(" Connection closed")
            except Exception as e:
                print(" Error in message handling: ", str(e))
        async def close(self) -> None:
            """Menutup koneksi WebSocket."""
            if self.ws:
                await self.ws.close()
  • Pilih mode interaksi

    • Mode VAD (Deteksi aktivitas suara, deteksi otomatis awal/akhir ucapan)

      API Realtime secara otomatis mendeteksi waktu bicara dan memberikan respons.

    • Mode manual (press-to-talk, release-to-send)

      Client mengontrol waktu bicara. Setelah pengguna selesai berbicara, client mengirim pesan ke server.

    Mode VAD

    Di direktori yang sama dengan omni_realtime_client.py, buat file Python lain bernama vad_mode.py dan salin kode berikut ke dalamnya:

    vad_mode.py

    # -- coding: utf-8 --
    import os, asyncio, pyaudio, queue, threading
    from omni_realtime_client import OmniRealtimeClient, TurnDetectionMode
    
    # Kelas pemutar audio (menangani interupsi)
    class AudioPlayer:
        def __init__(self, pyaudio_instance, rate=24000):
            self.stream = pyaudio_instance.open(format=pyaudio.paInt16, channels=1, rate=rate, output=True)
            self.queue = queue.Queue()
            self.stop_evt = threading.Event()
            self.interrupt_evt = threading.Event()
            threading.Thread(target=self._run, daemon=True).start()
    
        def _run(self):
            while not self.stop_evt.is_set():
                try:
                    data = self.queue.get(timeout=0.5)
                    if data is None: break
                    if not self.interrupt_evt.is_set(): self.stream.write(data)
                    self.queue.task_done()
                except queue.Empty: continue
    
        def add_audio(self, data): self.queue.put(data)
        def handle_interrupt(self): self.interrupt_evt.set(); self.queue.queue.clear()
        def stop(self): self.stop_evt.set(); self.queue.put(None); self.stream.stop_stream(); self.stream.close()
    
    # Rekam dari mikrofon dan kirim
    async def record_and_send(client):
        p = pyaudio.PyAudio()
        stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=3200)
        print("Recording started. Please speak...")
        try:
            while True:
                audio_data = stream.read(3200)
                await client.stream_audio(audio_data)
                await asyncio.sleep(0.02)
        finally:
            stream.stop_stream(); stream.close(); p.terminate()
    
    async def main():
        p = pyaudio.PyAudio()
        player = AudioPlayer(pyaudio_instance=p)
    
        client = OmniRealtimeClient(
            # Berikut adalah base_url untuk wilayah Internasional (Singapura). base_url untuk wilayah China (Beijing) adalah wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            base_url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime",
            api_key=os.environ.get("DASHSCOPE_API_KEY"),
            model="qwen3.5-omni-plus-realtime",
            voice="Cherry",
            instructions="You are Xiaoyun, a witty and humorous assistant.",
            turn_detection_mode=TurnDetectionMode.SERVER_VAD,
            on_text_delta=lambda t: print(f"\nAssistant: {t}", end="", flush=True),
            on_audio_delta=player.add_audio,
        )
    
        await client.connect()
        print("Connection successful. Starting real-time conversation...")
    
        # Jalankan secara konkuren
        await asyncio.gather(client.handle_messages(), record_and_send(client))
    
    if __name__ == "__main__":
        try:
            asyncio.run(main())
        except KeyboardInterrupt:
            print("\nProgram exited.")

    Jalankan vad_mode.py untuk melakukan percakapan real-time dengan model Realtime melalui mikrofon Anda. Sistem mendeteksi awal/akhir ucapan dan secara otomatis mengirim data ke server tanpa intervensi manual.

    Mode manual

    Di direktori yang sama dengan omni_realtime_client.py, buat file Python lain bernama manual_mode.py dan salin kode berikut ke dalamnya:

    manual_mode.py

    # -- coding: utf-8 --
    import os
    import asyncio
    import time
    import threading
    import queue
    import pyaudio
    from omni_realtime_client import OmniRealtimeClient, TurnDetectionMode
    
    
    class AudioPlayer:
        """Kelas pemutar audio real-time"""
    
        def __init__(self, sample_rate=24000, channels=1, sample_width=2):
            self.sample_rate = sample_rate
            self.channels = channels
            self.sample_width = sample_width  # 2 byte untuk 16-bit
            self.audio_queue = queue.Queue()
            self.is_playing = False
            self.play_thread = None
            self.pyaudio_instance = None
            self.stream = None
            self._lock = threading.Lock()  # Tambahkan lock untuk akses sinkronisasi
            self._last_data_time = time.time()  # Catat waktu data terakhir diterima
            self._response_done = False  # Tambahkan flag untuk menandakan penyelesaian respons
            self._waiting_for_response = False  # Flag untuk menandakan jika sedang menunggu respons server
            # Catat waktu data terakhir ditulis ke aliran audio dan durasi potongan audio terbaru untuk deteksi akhir pemutaran yang lebih akurat
            self._last_play_time = time.time()
            self._last_chunk_duration = 0.0
    
        def start(self):
            """Mulai pemutar audio"""
            with self._lock:
                if self.is_playing:
                    return
    
                self.is_playing = True
    
                try:
                    self.pyaudio_instance = pyaudio.PyAudio()
    
                    # Buat aliran output audio
                    self.stream = self.pyaudio_instance.open(
                        format=pyaudio.paInt16,  # 16-bit
                        channels=self.channels,
                        rate=self.sample_rate,
                        output=True,
                        frames_per_buffer=1024
                    )
    
                    # Mulai thread pemutaran
                    self.play_thread = threading.Thread(target=self._play_audio)
                    self.play_thread.daemon = True
                    self.play_thread.start()
    
                    print("Audio player started")
                except Exception as e:
                    print(f"Failed to start audio player: {e}")
                    self._cleanup_resources()
                    raise
    
        def stop(self):
            """Hentikan pemutar audio"""
            with self._lock:
                if not self.is_playing:
                    return
    
                self.is_playing = False
    
            # Kosongkan antrian
            while not self.audio_queue.empty():
                try:
                    self.audio_queue.get_nowait()
                except queue.Empty:
                    break
    
            # Tunggu thread pemutaran selesai (tunggu di luar lock untuk menghindari deadlock)
            if self.play_thread and self.play_thread.is_alive():
                self.play_thread.join(timeout=2.0)
    
            # Dapatkan lock lagi untuk membersihkan resource
            with self._lock:
                self._cleanup_resources()
    
            print("Audio player stopped")
    
        def _cleanup_resources(self):
            """Bersihkan resource audio (harus dipanggil dalam lock)"""
            try:
                # Tutup aliran audio
                if self.stream:
                    if not self.stream.is_stopped():
                        self.stream.stop_stream()
                    self.stream.close()
                    self.stream = None
            except Exception as e:
                print(f"Error closing audio stream: {e}")
    
            try:
                if self.pyaudio_instance:
                    self.pyaudio_instance.terminate()
                    self.pyaudio_instance = None
            except Exception as e:
                print(f"Error terminating PyAudio: {e}")
    
        def add_audio_data(self, audio_data):
            """Tambahkan data audio ke antrian pemutaran"""
            if self.is_playing and audio_data:
                self.audio_queue.put(audio_data)
                with self._lock:
                    self._last_data_time = time.time()  # Perbarui waktu data terakhir diterima
                    self._waiting_for_response = False  # Data diterima, tidak menunggu lagi
    
        def stop_receiving_data(self):
            """Tandai bahwa tidak ada data audio baru yang akan diterima"""
            with self._lock:
                self._response_done = True
                self._waiting_for_response = False  # Respons berakhir, tidak menunggu lagi
    
        def prepare_for_next_turn(self):
            """Reset status pemutar untuk giliran percakapan berikutnya."""
            with self._lock:
                self._response_done = False
                self._last_data_time = time.time()
                self._last_play_time = time.time()
                self._last_chunk_duration = 0.0
                self._waiting_for_response = True  # Mulai menunggu respons berikutnya
    
            # Kosongkan data audio yang tersisa dari giliran sebelumnya
            while not self.audio_queue.empty():
                try:
                    self.audio_queue.get_nowait()
                except queue.Empty:
                    break
    
        def is_finished_playing(self):
            """Periksa apakah semua data audio telah diputar"""
            with self._lock:
                queue_size = self.audio_queue.qsize()
                time_since_last_data = time.time() - self._last_data_time
                time_since_last_play = time.time() - self._last_play_time
    
                # ---------------------- Deteksi akhir cerdas ----------------------
                # 1. Lebih disukai: Jika server telah menandai penyelesaian dan antrian pemutaran kosong.
                #    Tunggu potongan audio terbaru selesai diputar (durasi potongan + toleransi 0.1 detik).
                if self._response_done and queue_size == 0:
                    min_wait = max(self._last_chunk_duration + 0.1, 0.5)  # Tunggu minimal 0.5 detik
                    if time_since_last_play >= min_wait:
                        return True
    
                # 2. Cadangan: Jika tidak ada data baru yang diterima dalam waktu lama dan antrian pemutaran kosong.
                #    Logika ini berfungsi sebagai pengaman jika server tidak secara eksplisit mengirim `response.done`.
                if not self._waiting_for_response and queue_size == 0 and time_since_last_data > 1.0:
                    print("\n(No new audio received for a while, assuming playback is finished)")
                    return True
    
                return False
    
        def _play_audio(self):
            """Thread pekerja untuk memutar data audio"""
            while True:
                # Periksa apakah harus berhenti
                with self._lock:
                    if not self.is_playing:
                        break
                    stream_ref = self.stream  # Dapatkan referensi ke aliran
    
                try:
                    # Dapatkan data audio dari antrian, dengan timeout 0.1 detik
                    audio_data = self.audio_queue.get(timeout=0.1)
    
                    # Periksa status dan validitas aliran lagi
                    with self._lock:
                        if self.is_playing and stream_ref and not stream_ref.is_stopped():
                            try:
                                # Putar data audio
                                stream_ref.write(audio_data)
                                # Perbarui informasi pemutaran terbaru
                                self._last_play_time = time.time()
                                self._last_chunk_duration = len(audio_data) / (
                                            self.channels * self.sample_width) / self.sample_rate
                            except Exception as e:
                                print(f"Error writing to audio stream: {e}")
                                break
    
                    # Tandai blok data ini telah diproses
                    self.audio_queue.task_done()
    
                except queue.Empty:
                    # Lanjutkan menunggu jika antrian kosong
                    continue
                except Exception as e:
                    print(f"Error playing audio: {e}")
                    break
    
    
    class MicrophoneRecorder:
        """Perekam mikrofon real-time"""
    
        def __init__(self, sample_rate=16000, channels=1, chunk_size=3200):
            self.sample_rate = sample_rate
            self.channels = channels
            self.chunk_size = chunk_size
            self.pyaudio_instance = None
            self.stream = None
            self.frames = []
            self._is_recording = False
            self._record_thread = None
    
        def _recording_thread(self):
            """Thread pekerja perekaman"""
            # Terus-menerus membaca data dari aliran audio selama _is_recording bernilai True
            while self._is_recording:
                try:
                    # Gunakan exception_on_overflow=False untuk menghindari crash karena luapan buffer
                    data = self.stream.read(self.chunk_size, exception_on_overflow=False)
                    self.frames.append(data)
                except (IOError, OSError) as e:
                    # Membaca dari aliran mungkin menimbulkan error saat ditutup
                    print(f"Error reading from recording stream, it might be closed: {e}")
                    break
    
        def start(self):
            """Mulai perekaman"""
            if self._is_recording:
                print("Recording is already in progress.")
                return
    
            self.frames = []
            self._is_recording = True
    
            try:
                self.pyaudio_instance = pyaudio.PyAudio()
                self.stream = self.pyaudio_instance.open(
                    format=pyaudio.paInt16,
                    channels=self.channels,
                    rate=self.sample_rate,
                    input=True,
                    frames_per_buffer=self.chunk_size
                )
    
                self._record_thread = threading.Thread(target=self._recording_thread)
                self._record_thread.daemon = True
                self._record_thread.start()
                print("Microphone recording started...")
            except Exception as e:
                print(f"Failed to start microphone: {e}")
                self._is_recording = False
                self._cleanup()
                raise
    
        def stop(self):
            """Hentikan perekaman dan kembalikan data audio"""
            if not self._is_recording:
                return None
    
            self._is_recording = False
    
            # Tunggu thread perekaman keluar dengan aman
            if self._record_thread:
                self._record_thread.join(timeout=1.0)
    
            self._cleanup()
    
            print("Microphone recording stopped.")
            return b''.join(self.frames)
    
        def _cleanup(self):
            """Bersihkan resource PyAudio dengan aman"""
            if self.stream:
                try:
                    if self.stream.is_active():
                        self.stream.stop_stream()
                    self.stream.close()
                except Exception as e:
                    print(f"Error closing audio stream: {e}")
    
            if self.pyaudio_instance:
                try:
                    self.pyaudio_instance.terminate()
                except Exception as e:
                    print(f"Error terminating PyAudio instance: {e}")
    
            self.stream = None
            self.pyaudio_instance = None
    
    
    async def interactive_test():
        """
        Skrip uji interaktif: Memungkinkan percakapan multi-putaran, dengan audio dan gambar dikirim di setiap giliran.
        """
        # ------------------- 1. Inisialisasi dan koneksi (sekali saja) -------------------
        # API key untuk Singapura dan Beijing berbeda. Untuk mendapatkan API key, lihat https://www.alibabacloud.com/help/en/model-studio/get-api-key
        api_key = os.environ.get("DASHSCOPE_API_KEY")
        if not api_key:
            print("Please set the DASHSCOPE_API_KEY environment variable.")
            return
    
        print("--- Real-time Multimodal Audio/Video Chat Client ---")
        print("Initializing audio player and client...")
    
        audio_player = AudioPlayer()
        audio_player.start()
    
        def on_audio_received(audio_data):
            audio_player.add_audio_data(audio_data)
    
        def on_response_done(event):
            print("\n(Received response end marker)")
            audio_player.stop_receiving_data()
    
        realtime_client = OmniRealtimeClient(
            # Berikut adalah base_url untuk wilayah Singapura. Jika Anda menggunakan model di wilayah Beijing, ganti base_url dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            base_url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime",
            api_key=api_key,
            model="qwen3.5-omni-plus-realtime",
            voice="Ethan",
            instructions="You are Xiaoyun, a personal assistant. Please answer the user's questions accurately and friendly, always responding with a helpful attitude.", # Tetapkan peran model
            on_text_delta=lambda text: print(f"Assistant reply: {text}", end="", flush=True),
            on_audio_delta=on_audio_received,
            turn_detection_mode=TurnDetectionMode.MANUAL,
            extra_event_handlers={"response.done": on_response_done}
        )
    
        message_handler_task = None
        try:
            await realtime_client.connect()
            print("Connected to the server. Enter 'q' or 'quit' to exit at any time.")
            message_handler_task = asyncio.create_task(realtime_client.handle_messages())
            await asyncio.sleep(0.5)
    
            turn_counter = 1
            # ------------------- 2. Loop percakapan multi-putaran -------------------
            while True:
                print(f"\n--- Turn {turn_counter} ---")
                audio_player.prepare_for_next_turn()
    
                recorded_audio = None
                image_paths = []
    
                # --- Dapatkan input pengguna: Rekam dari mikrofon ---
                loop = asyncio.get_event_loop()
                recorder = MicrophoneRecorder(sample_rate=16000)  # Laju sampel 16k direkomendasikan untuk pengenalan ucapan
    
                print("Ready to record. Press Enter to start recording (or enter 'q' to exit)...")
                user_input = await loop.run_in_executor(None, input)
                if user_input.strip().lower() in ['q', 'quit']:
                    print("User requested to exit...")
                    return
    
                try:
                    recorder.start()
                except Exception:
                    print("Could not start recording. Please check your microphone permissions and device. Skipping this turn.")
                    continue
    
                print("Recording... Press Enter again to stop.")
                await loop.run_in_executor(None, input)
    
                recorded_audio = recorder.stop()
    
                if not recorded_audio or len(recorded_audio) == 0:
                    print("No valid audio was recorded. Please start this turn again.")
                    continue
    
                # --- Dapatkan input gambar (opsional) ---
                # Fitur input gambar di bawah ini dikomentari dan dinonaktifkan sementara. Untuk mengaktifkannya, hapus komentar kode di bawah.
                # print("\nEnter the absolute path of an [image file] on each line (optional). When finished, enter 's' or press Enter to send the request.")
                # while True:
                #     path = input("Image path: ").strip()
                #     if path.lower() == 's' or path == '':
                #         break
                #     if path.lower() in ['q', 'quit']:
                #         print("User requested to exit...")
                #         return
                #
                #     if not os.path.isabs(path):
                #         print("Error: Please enter an absolute path.")
                #         continue
                #     if not os.path.exists(path):
                #         print(f"Error: File not found -> {path}")
                #         continue
                #     image_paths.append(path)
                #     print(f"Image added: {os.path.basename(path)}")
    
                # --- 3. Kirim data dan dapatkan respons ---
                print("\n--- Input Confirmation ---")
                print(f"Audio to process: 1 (from microphone), Images: {len(image_paths)}")
                print("------------------")
    
                # 3.1 Kirim rekaman audio
                try:
                    print(f"Sending microphone recording ({len(recorded_audio)} bytes)")
                    await realtime_client.stream_audio(recorded_audio)
                    await asyncio.sleep(0.1)
                except Exception as e:
                    print(f"Failed to send microphone recording: {e}")
                    continue
    
                # 3.2 Kirim semua file gambar
                # Kode pengiriman gambar di bawah ini dikomentari dan dinonaktifkan.
                # for i, path in enumerate(image_paths):
                #     try:
                #         with open(path, "rb") as f:
                #             data = f.read()
                #         print(f"Sending image {i+1}: {os.path.basename(path)} ({len(data)} bytes)")
                #         await realtime_client.append_image(data)
                #         await asyncio.sleep(0.1)
                #     except Exception as e:
                #         print(f"Failed to send image {os.path.basename(path)}: {e}")
    
                # 3.3 Kirim dan tunggu respons
                print("Submitting all inputs, requesting server response...")
                await realtime_client.commit_audio_buffer()
                await realtime_client.create_response()
    
                print("Waiting for and playing server response audio...")
                start_time = time.time()
                max_wait_time = 60
                while not audio_player.is_finished_playing():
                    if time.time() - start_time > max_wait_time:
                        print(f"\nWait timed out ({max_wait_time} seconds). Moving to the next turn.")
                        break
                    await asyncio.sleep(0.2)
    
                print("\nAudio playback for this turn is complete!")
                turn_counter += 1
    
        except (asyncio.CancelledError, KeyboardInterrupt):
            print("\nProgram was interrupted.")
        except Exception as e:
            print(f"An unhandled error occurred: {e}")
        finally:
            # ------------------- 4. Bersihkan resource -------------------
            print("\nClosing connection and cleaning up resources...")
            if message_handler_task and not message_handler_task.done():
                message_handler_task.cancel()
    
            if 'realtime_client' in locals() and realtime_client.ws and not realtime_client.ws.close:
                await realtime_client.close()
                print("Connection closed.")
    
            audio_player.stop()
            print("Program exited.")
    
    
    if __name__ == "__main__":
        try:
            asyncio.run(interactive_test())
        except KeyboardInterrupt:
            print("\nProgram was forcibly exited by the user.")

    Jalankan manual_mode.py. Tekan Enter untuk berbicara, lalu tekan Enter lagi untuk menerima respons audio model.

Alur interaksi

Mode VAD

Atur session.turn_detection ke "server_vad" dalam event session.update untuk mengaktifkan mode VAD. Dalam mode ini, server secara otomatis mendeteksi awal/akhir ucapan dan memberikan respons. Cocok untuk panggilan suara.

Alur interaksinya sebagai berikut:

  1. Server mendeteksi awal ucapan dan mengirim event input_audio_buffer.speech_started.

  2. Client dapat mengirim event input_audio_buffer.append dan input_image_buffer.append kapan saja untuk menambahkan audio dan gambar ke buffer.

    Sebelum mengirim event input_image_buffer.append, Anda harus mengirim setidaknya satu event input_audio_buffer.append.
  3. Server mendeteksi akhir ucapan dan mengirim event input_audio_buffer.speech_stopped.

  4. Server mengirim event input_audio_buffer.committed untuk commit buffer audio.

  5. Server mengirim event conversation.item.created yang berisi item pesan pengguna yang dibuat dari buffer.

Siklus hidup

Event client

Event server

Inisialisasi sesi

session.update

Konfigurasi sesi

session.created

Sesi dibuat

session.updated

Konfigurasi sesi diperbarui

Input audio pengguna

input_audio_buffer.append

Tambahkan audio ke buffer

input_image_buffer.append

Tambahkan gambar ke buffer

input_audio_buffer.speech_started

Awal ucapan terdeteksi

input_audio_buffer.speech_stopped

Akhir ucapan terdeteksi

input_audio_buffer.committed

Server menerima audio yang dikirim

Output audio server

Tidak ada

response.created

Server mulai menghasilkan respons

response.output_item.added

Konten output baru selama respons

conversation.item.created

Item percakapan dibuat

response.content_part.added

Konten output baru ditambahkan ke pesan asisten

response.audio_transcript.delta

Teks transkripsi yang dihasilkan secara inkremental

response.audio.delta

Audio yang dihasilkan secara inkremental dari model

response.audio_transcript.done

Transkripsi teks selesai

response.audio.done

Generasi audio selesai

response.content_part.done

Streaming konten teks atau audio untuk pesan asisten selesai

response.output_item.done

Streaming seluruh item output untuk pesan asisten selesai

response.done

Respons selesai

Mode manual

Atur session.turn_detection ke null dalam event session.update untuk mengaktifkan Mode Manual. Dalam mode ini, client secara eksplisit mengirim event input_audio_buffer.commit dan response.create untuk meminta respons server. Mode ini cocok untuk skenario press-to-talk, seperti mengirim pesan suara di aplikasi chat.

Alur interaksinya sebagai berikut:

  1. Client dapat mengirim event input_audio_buffer.append dan input_image_buffer.append kapan saja untuk menambahkan audio dan gambar ke buffer.

    Sebelum mengirim event input_image_buffer.append, Anda harus mengirim setidaknya satu event input_audio_buffer.append.
  2. Client mengirim event input_audio_buffer.commit untuk mengirim buffer audio dan gambar, memberi sinyal ke server bahwa semua input pengguna (audio dan gambar) untuk giliran saat ini telah dikirim.

  3. Server merespons dengan event input_audio_buffer.committed.

  4. Client mengirim event response.create, menunggu server mengembalikan output model.

  5. Server merespons dengan event conversation.item.created.

Siklus hidup

Event klien

Event server

Inisialisasi sesi

session.update

Konfigurasi sesi

session.created

Sesi dibuat

session.updated

Konfigurasi sesi diperbarui

Input audio pengguna

input_audio_buffer.append

Tambahkan audio ke buffer

input_image_buffer.append

Tambahkan gambar ke buffer

input_audio_buffer.commit

Kirim audio dan gambar ke server

response.create

Buat respons model

input_audio_buffer.committed

Server menerima audio yang dikirim

Output audio server

input_audio_buffer.clear

Kosongkan audio dari buffer

response.created

Server mulai menghasilkan respons

response.output_item.added

Konten output baru selama respons

conversation.item.created

Item percakapan dibuat

response.content_part.added

Konten output baru ditambahkan ke item pesan asisten

response.audio_transcript.delta

Teks transkripsi yang dihasilkan secara inkremental

response.audio.delta

Audio yang dihasilkan secara inkremental dari model

response.audio_transcript.done

Transkripsi teks selesai

response.audio.done

Generasi audio selesai

response.content_part.done

Streaming konten teks atau audio untuk pesan asisten selesai

response.output_item.done

Streaming seluruh item output untuk pesan asisten selesai

response.done

Respons selesai

Pencarian web

Fitur pencarian web memungkinkan model membalas menggunakan data yang diambil secara real-time. Gunakan untuk skenario yang membutuhkan informasi terkini, seperti harga saham atau prakiraan cuaca. Model secara otonom memutuskan apakah perlu menjalankan pencarian web untuk menjawab pertanyaan Anda.

Hanya model Qwen3.5-Omni-Realtime yang mendukung pencarian web. Secara default dinonaktifkan. Aktifkan menggunakan event session.update.
Untuk detail penagihan, lihat kebijakan agent dalam dokumentasi penagihan

Cara mengaktifkan

Dalam event session.update, tambahkan parameter berikut:

  • enable_search: Atur ke true untuk mengaktifkan pencarian web.

  • search_options.enable_source: Atur ke true untuk mengembalikan daftar sumber hasil pencarian.

Untuk detail parameter lengkap, lihat session.update.

Format respons

Setelah Anda mengaktifkan pencarian web, event response.done menyertakan field baru plugins dalam objek usage. Field ini mencatat metrik penggunaan pencarian:

{
    "usage": {
        "total_tokens": 2937,
        "input_tokens": 2554,
        "output_tokens": 383,
        "input_tokens_details": {
            "text_tokens": 2512,
            "audio_tokens": 42
        },
        "output_tokens_details": {
            "text_tokens": 90,
            "audio_tokens": 293
        },
        "plugins": {
            "search": {
                "count": 1,
                "strategy": "agent"
            }
        }
    }
}

Contoh kode

Contoh berikut menunjukkan cara mengaktifkan pencarian web dalam percakapan real-time.

DashScope Python SDK

Dalam pemanggilan update_session, berikan parameter enable_search dan search_options:

import os
import base64
import time
import json
import pyaudio
from dashscope.audio.qwen_omni import MultiModality, AudioFormat, OmniRealtimeCallback, OmniRealtimeConversation
import dashscope

dashscope.api_key = os.getenv('DASHSCOPE_API_KEY')
url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
model = 'qwen3.5-omni-plus-realtime'
voice = 'Tina'

class SearchCallback(OmniRealtimeCallback):
    def __init__(self, pya):
        self.pya = pya
        self.out = None
    def on_open(self):
        self.out = self.pya.open(format=pyaudio.paInt16, channels=1, rate=24000, output=True)
    def on_event(self, response):
        if response['type'] == 'response.audio.delta':
            self.out.write(base64.b64decode(response['delta']))
        elif response['type'] == 'conversation.item.input_audio_transcription.completed':
            print(f"[User] {response['transcript']}")
        elif response['type'] == 'response.audio_transcript.done':
            print(f"[LLM] {response['transcript']}")
        elif response['type'] == 'response.done':
            usage = response.get('response', {}).get('usage', {})
            plugins = usage.get('plugins', {})
            if plugins.get('search'):
                print(f"[Search] count={plugins['search']['count']}, strategy={plugins['search']['strategy']}")

pya = pyaudio.PyAudio()
callback = SearchCallback(pya)
conv = OmniRealtimeConversation(model=model, callback=callback, url=url)
conv.connect()
conv.update_session(
    output_modalities=[MultiModality.AUDIO, MultiModality.TEXT],
    voice=voice,
    instructions="You are Xiao Yun, a personal assistant",
    enable_search=True,
    search_options={'enable_source': True}
)
mic = pya.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
print("Web search is enabled. Speak into the microphone (press Ctrl+C to exit)...")
try:
    while True:
        audio_data = mic.read(3200, exception_on_overflow=False)
        conv.append_audio(base64.b64encode(audio_data).decode())
        time.sleep(0.01)
except KeyboardInterrupt:
    conv.close()
    mic.close()
    callback.out.close()
    pya.terminate()
    print("\nConversation ended")

DashScope Java SDK

Dalam updateSession, berikan pengaturan pencarian web melalui map parameters:

import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.nio.ByteBuffer;
import java.util.*;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class OmniSearch {
    static class SequentialAudioPlayer {
        private final SourceDataLine line;
        private final Queue<byte[]> audioQueue = new ConcurrentLinkedQueue<>();
        private final Thread playerThread;
        private final AtomicBoolean shouldStop = new AtomicBoolean(false);

        public SequentialAudioPlayer() throws LineUnavailableException {
            AudioFormat format = new AudioFormat(24000, 16, 1, true, false);
            line = AudioSystem.getSourceDataLine(format);
            line.open(format);
            line.start();
            playerThread = new Thread(() -> {
                while (!shouldStop.get()) {
                    byte[] audio = audioQueue.poll();
                    if (audio != null) {
                        line.write(audio, 0, audio.length);
                    } else {
                        try { Thread.sleep(10); } catch (InterruptedException ignored) {}
                    }
                }
            }, "AudioPlayer");
            playerThread.start();
        }

        public void play(String base64Audio) {
            audioQueue.add(Base64.getDecoder().decode(base64Audio));
        }
        public void close() {
            shouldStop.set(true);
            try { playerThread.join(1000); } catch (InterruptedException ignored) {}
            line.drain();
            line.close();
        }
    }

    public static void main(String[] args) {
        try {
            SequentialAudioPlayer player = new SequentialAudioPlayer();
            AtomicBoolean shouldStop = new AtomicBoolean(false);

            OmniRealtimeParam param = OmniRealtimeParam.builder()
                    .model("qwen3.5-omni-plus-realtime")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    .build();

            OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
                @Override public void onOpen() {
                    System.out.println("Connection established");
                }
                @Override public void onClose(int code, String reason) {
                    System.out.println("Connection closed");
                    shouldStop.set(true);
                }
                @Override public void onEvent(JsonObject event) {
                    String type = event.get("type").getAsString();
                    if ("response.audio.delta".equals(type)) {
                        player.play(event.get("delta").getAsString());
                    } else if ("response.audio_transcript.done".equals(type)) {
                        System.out.println("[LLM] " + event.get("transcript").getAsString());
                    } else if ("response.done".equals(type)) {
                        JsonObject response = event.getAsJsonObject("response");
                        if (response != null && response.has("usage")) {
                            JsonObject usage = response.getAsJsonObject("usage");
                            if (usage.has("plugins")) {
                                JsonObject plugins = usage.getAsJsonObject("plugins");
                                if (plugins.has("search")) {
                                    JsonObject search = plugins.getAsJsonObject("search");
                                    System.out.println("[Search] count=" + search.get("count").getAsInt()
                                            + ", strategy=" + search.get("strategy").getAsString());
                                }
                            }
                        }
                    }
                }
            });

            conversation.connect();
            conversation.updateSession(OmniRealtimeConfig.builder()
                    .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
                    .voice("Tina")
                    .enableTurnDetection(true)
                    .enableInputAudioTranscription(true)
                    .parameters(Map.of(
                            "instructions", "You are Xiao Yun, a personal assistant",
                            "enable_search", true,
                            "search_options", Map.of("enable_source", true)
                    ))
                    .build()
            );

            System.out.println("Web search is enabled. Start speaking (press Ctrl+C to exit)...");
            AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
            TargetDataLine mic = AudioSystem.getTargetDataLine(format);
            mic.open(format);
            mic.start();

            ByteBuffer buffer = ByteBuffer.allocate(3200);
            while (!shouldStop.get()) {
                int bytesRead = mic.read(buffer.array(), 0, buffer.capacity());
                if (bytesRead > 0) {
                    conversation.appendAudio(Base64.getEncoder().encodeToString(buffer.array()));
                }
                Thread.sleep(20);
            }

            conversation.close(1000, "Normal end");
            player.close();
            mic.close();
        } catch (NoApiKeyException e) {
            System.err.println("API key not found: Set the DASHSCOPE_API_KEY environment variable");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

WebSocket (Python)

Dalam payload JSON untuk session.update, tambahkan field enable_search dan search_options:

import json
import os
import websocket
import base64
import pyaudio
import threading

API_KEY = os.getenv("DASHSCOPE_API_KEY")
API_URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3.5-omni-plus-realtime"

pya = pyaudio.PyAudio()
out_stream = pya.open(format=pyaudio.paInt16, channels=1, rate=24000, output=True)

def on_open(ws):
    ws.send(json.dumps({
        "type": "session.update",
        "session": {
            "modalities": ["text", "audio"],
            "voice": "Tina",
            "instructions": "You are Xiao Yun, a personal assistant",
            "input_audio_format": "pcm",
            "output_audio_format": "pcm",
            "enable_search": True,
            "search_options": {
                "enable_source": True
            }
        }
    }))
    print("Web search is enabled. Speak into the microphone...")
    def send_audio():
        mic = pya.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
        try:
            while True:
                audio = mic.read(3200, exception_on_overflow=False)
                ws.send(json.dumps({
                    "type": "input_audio_buffer.append",
                    "audio": base64.b64encode(audio).decode()
                }))
        except Exception:
            mic.close()
    threading.Thread(target=send_audio, daemon=True).start()

def on_message(ws, message):
    event = json.loads(message)
    if event["type"] == "response.audio.delta":
        out_stream.write(base64.b64decode(event["delta"]))
    elif event["type"] == "response.audio_transcript.done":
        print(f"[LLM] {event['transcript']}")
    elif event["type"] == "response.done":
        usage = event.get("response", {}).get("usage", {})
        plugins = usage.get("plugins", {})
        if plugins.get("search"):
            print(f"[Search] count={plugins['search']['count']}, strategy={plugins['search']['strategy']}")

def on_error(ws, error):
    print(f"Error: {error}")

headers = ["Authorization: Bearer " + API_KEY]
ws = websocket.WebSocketApp(API_URL, header=headers, on_open=on_open, on_message=on_message, on_error=on_error)
ws.run_forever()

Referensi API

Penagihan dan pembatasan laju

Aturan penagihan

Model Qwen-Omni-Realtime menagih berdasarkan penggunaan token untuk setiap modalitas—audio dan gambar. Untuk detailnya, lihat Daftar Model.

Aturan konversi audio dan gambar ke token

Audio

  • Qwen3.5-Omni-Realtime: Total token = Durasi audio (dalam detik) × 7

  • Qwen3-Omni-Flash-Realtime: Total token = Durasi audio (dalam detik) × 12,5

  • Qwen-Omni-Turbo-Realtime: Total token = Durasi audio (dalam detik) × 25. Jika durasi audio kurang dari 1 detik, dihitung sebagai 1 detik.

Gambar

  • Model Qwen3.5-Omni-Plus-Realtime: 1 token per 32×32 piksel

  • Model Qwen3-Omni-Flash-Realtime: 1 token per 32×32 piksel

  • Model Qwen-Omni-Turbo-Realtime: 1 token per 28×28 piksel

Satu gambar memerlukan minimal 4 token dan mendukung maksimal 1.280 token. Anda dapat menggunakan kode berikut untuk memperkirakan total jumlah token yang dikonsumsi oleh gambar:

# Instal library Pillow menggunakan perintah berikut: pip install Pillow
from PIL import Image
import math

# Untuk model Qwen-Omni-Turbo-Realtime, faktor zoom adalah 28.
# factor = 28
# Untuk model Qwen3-Omni-Flash-Realtime dan Qwen3.5-Omni-Realtime, faktor zoom adalah 32.
factor = 32

def token_calculate(image_path='', duration=10):
    """
    :param image_path: Jalur gambar.
    :param duration: Durasi koneksi sesi.
    :return: Jumlah token untuk gambar.
    """
    if len(image_path) > 0:
        # Buka file gambar PNG yang ditentukan.
        image = Image.open(image_path)
        # Dapatkan dimensi asli gambar.
        height = image.height
        width = image.width
        print(f"Image dimensions before scaling: height={height}, width={width}")
        # Sesuaikan tinggi menjadi kelipatan integer dari faktor.
        h_bar = round(height / factor) * factor
        # Sesuaikan lebar menjadi kelipatan integer dari faktor.
        w_bar = round(width / factor) * factor
        # Batas bawah token gambar: 4 token.
        min_pixels = factor * factor * 4
        # Batas atas token gambar: 1.280 token.
        max_pixels = 1280 * factor * factor
        # Skala gambar untuk memastikan total piksel berada dalam rentang [min_pixels, max_pixels].
        if h_bar * w_bar > max_pixels:
            # Hitung faktor penskalaan beta sehingga total piksel gambar yang diskala tidak melebihi max_pixels.
            beta = math.sqrt((height * width) / max_pixels)
            # Hitung ulang tinggi yang disesuaikan untuk memastikan merupakan kelipatan integer dari faktor.
            h_bar = math.floor(height / beta / factor) * factor
            # Hitung ulang lebar yang disesuaikan untuk memastikan merupakan kelipatan integer dari faktor.
            w_bar = math.floor(width / beta / factor) * factor
        elif h_bar * w_bar < min_pixels:
            # Hitung faktor penskalaan beta sehingga total piksel gambar yang diskala tidak kurang dari min_pixels.
            beta = math.sqrt(min_pixels / (height * width))
            # Hitung ulang tinggi yang disesuaikan untuk memastikan merupakan kelipatan integer dari faktor.
            h_bar = math.ceil(height * beta / factor) * factor
            # Hitung ulang lebar yang disesuaikan untuk memastikan merupakan kelipatan integer dari faktor.
            w_bar = math.ceil(width * beta / factor) * factor
        print(f"Image dimensions after scaling: height={h_bar}, width={w_bar}")
        # Hitung jumlah token untuk gambar: total piksel dibagi (faktor × faktor).
        token = int((h_bar * w_bar) / (factor * factor))
        print(f"Number of tokens after scaling: {token}")
        total_token = token * math.ceil(duration / 2)
        print(f"Total number of tokens: {total_token}")
        return total_token
    else:
        print("Error: image_path is empty. Cannot calculate tokens.")
        return 0

if __name__ == "__main__":
    total_token = token_calculate(image_path="xxx/test.jpg", duration=10)

Pembatasan laju

Untuk informasi lebih lanjut tentang aturan pembatasan laju model, lihat Pembatasan laju.

Kode error

Jika pemanggilan model gagal dan mengembalikan pesan error, lihat Pesan error untuk resolusi.

Daftar suara

Atur parameter permintaan voice ke nilai pada kolom parameter suara.

qwen3.5-omni-realtime

voice parameter

Detail

Bahasa yang didukung

Tina

Nama suara: Tina

Deskripsi: Suaraku seperti teh susu hangat—manis dan nyaman, namun tajam saat memecahkan masalah.

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Cindy

Nama suara: Cindy

Deskripsi: Wanita muda manis dari Taiwan

Chinese (Taiwanese accent), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Liora Mira

Nama suara: Qinghuan Liora Mira

Deskripsi: Suara lembut yang menganyam kehangatan ke dalam kehidupan sehari-hari

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Sunnybobi

Nama suara: Sunnybobi

Deskripsi: Gadis tetangga ceria yang canggung secara sosial

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Raymond

Nama Suara: Lin Chuanye (Raymond)

Deskripsi: Seseorang dengan suara jernih, pecinta makanan kemasan, dan senang menghabiskan waktu di rumah.

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Ethan

Nama Suara: Chenxu Ethan

Deskripsi: Mandarin standar dengan sedikit aksen utara. Cerah, hangat, energetik, dan muda

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Theo Calm

Nama suara: Theo Calm

Deskripsi: Menyampaikan pengertian dalam diam dan penyembuhan melalui kata-kata

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Serena

Nama suara: Serena

Deskripsi: Wanita muda lembut

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Harvey

Nama suara: Harvey

Deskripsi: Suaraku membawa beban waktu—dalam, lembut, dan beraroma kopi dan buku tua

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Maia

Nama suara: Maia

Deskripsi: Perpaduan kecerdasan dan kelembutan

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Evan

Nama suara: Evan

Deskripsi: Mahasiswa—muda dan menawan

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Qiao

Nama suara: Qiao

Deskripsi: Tidak hanya imut—dia manis di permukaan dan penuh kepribadian di dalam

Chinese (Taiwanese accent), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Momo

Nama suara: Momo

Deskripsi: Main-main dan nakal—di sini untuk menghibur Anda

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Wil

Nama suara: Wil

Deskripsi: Pria muda dari Shenzhen yang berbicara dengan aksen Hong Kong–Taiwan

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Angel

Nama Suara: Tai Pu – An Qi Angel

Deskripsi: Sedikit beraksen Taiwan—dan sangat manis

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Li Cassian

Nama Timbre: Dongchang—Grand Eunuch Li Cassian

Deskripsi: Berbicara dengan pengekangan—tiga bagian diam, tujuh bagian membaca suasana

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Mia

Nama Suara: Gentle Lifestyle Blogger - Shuran Mia

Deskripsi: Seniman gaya hidup yang berbagi estetika hidup lambat dan kenyamanan sehari-hari melalui suara yang menenangkan

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Joyner

Nama Suara: Comedy Specialist—Adou Joyner

Deskripsi: Lucu, berlebihan, dan apa adanya

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Gold

Nama suara: Gold

Deskripsi: Rapper kulit hitam Pantai Barat

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Katerina

Nama suara: Katerina

Deskripsi: Suara matang dan berwibawa dengan ritme dan resonansi kaya

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Ryan

Nama Suara: Sweet Tea Ryan

Deskripsi: Penyampaian energik dengan kehadiran dramatis kuat—realisme bertemu intensitas

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Jennifer

Nama suara: Jennifer

Deskripsi: Suara wanita Amerika premium berkualitas sinematik

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Aiden

Nama suara: Aiden

Deskripsi: Pria muda Amerika yang ahli memasak

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Mione

Nama suara: Mione

Deskripsi: Gadis tetangga Inggris yang matang dan cerdas

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Sunny

Nama Suara: Sichuan - Qing'er Sunny

Deskripsi: Gadis Sichuan manis yang menghangatkan hati Anda

Chinese (Sichuan dialect), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Dylan

Nama Suara: Beijing–Xiaodong (Dylan)

Deskripsi: Pemuda yang dibesarkan di hutong Beijing

Chinese (Beijing dialect), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Eric

Nama Suara: Sichuan - Cheng Chuan Eric

Deskripsi: Pria Chengdu Sichuan yang hidup

Chinese (Sichuan dialect), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Peter

Nama Suara: Tianjin–Li Bide (Peter)

Deskripsi: Pemain xiangsheng Tianjin—pendukung profesional

Chinese (Tianjin dialect), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Joseph Chen

Nama suara: Joseph Chen

Deskripsi: Saya Paman Pu. Nama asli saya Chen Zhipu—orang Tionghoa perantauan lama dari Asia Tenggara

Chinese (Hokkien), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Marcus

Nama Timbre: Shaanxi–Qinchuan Marcus

Deskripsi: Wajah lebar, sedikit bicara, hati tulus, suara dalam—rasa autentik Shaanxi

Chinese (Shaanxi dialect), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Li

Nama Suara: Nanjing–Lao Li

Deskripsi: Paman pemarah

Chinese (Nanjing dialect), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Rocky

Nama suara: Cantonese – Ah Qiang Rocky

Deskripsi: Ah Qiang menyediakan teman ngobrol online dengan humor dan kecerdasan.

Chinese (Cantonese), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Sohee

Nama suara: Sohee

Deskripsi: Unnie Korea yang hangat, ceria, dan ekspresif secara emosional

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Lenn

Nama suara: Lenn

Deskripsi: Rasional di inti, memberontak dalam detail—pemuda Jerman yang mengenakan setelan dan mendengarkan post-punk

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Ono Anna

Nama suara: Ono Anna

Deskripsi: Teman masa kecil yang cerdas dan suka bermain

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Sonrisa

Nama suara: Sonrisa

Deskripsi: Wanita Latin Amerika yang hangat dan terbuka

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Bodega

Nama Timbre: Bodega

Deskripsi: Pria Spanyol yang hangat dan antusias

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Emilien

Nama suara: Emilien

Deskripsi: Kakak laki-laki Prancis yang romantis

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Andre

Nama suara: Andre

Deskripsi: Suara pria magnetis, alami, dan stabil

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Radio Gol

Nama suara: Radio Gol

Deskripsi: Saya Penyair Sepak Bola dari Rádio Gol! Hari ini, saya akan berkomentar pertandingan hanya menggunakan nama pemain.

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Alek

Nama suara: Alek

Deskripsi: Dingin seperti semangat Rusia—namun hangat seperti wol di bawah mantel

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Rizky

Nama suara: Rizky

Deskripsi: Pria muda Indonesia dengan suara khas

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Roya

Nama suara: Roya

Deskripsi: Gadis sporty dengan hati bebas

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Arda

Nama suara: Arda

Deskripsi: Tidak terlalu tinggi atau rendah—bersih, jernih, dan hangat lembut

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Hana

Nama suara: Hana

Deskripsi: Wanita Vietnam dewasa yang menyukai anjing

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Dolce

Nama Timbre: Dolce

Deskripsi: Pria Italia yang santai

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Jakub

Nama suara: Jakub

Deskripsi: Pemuda artistik dan karismatik dari kota Polandia

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Griet

Nama suara: Griet

Deskripsi: Wanita Belanda dewasa yang artistik

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Eliška

Nama suara: Eliška

Deskripsi: Setiap kata membawa keahlian dan kehangatan Eropa Tengah

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Marina

Nama suara: Marina

Deskripsi: Gadis yang dibesarkan di kota multikultural

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Siiri

Nama suara: Siiri

Deskripsi: Pendiam dan lembut—dengan kecepatan bicara tenang seperti danau

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Ingrid

Nama suara: Ingrid

Deskripsi: Wanita dari pedesaan Norwegia

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Sigga

Nama suara: Sigga

Deskripsi: Wanita muda intelektual dari kota Islandia

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Bea

Nama suara: Bea

Deskripsi: Wanita Filipina manis yang menyukai kopi

Chinese (Mandarin), Chinese, English

French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

Chloe

Nama suara: Chloe

Deskripsi: Pekerja kantor Malaysia

Chinese (Mandarin), Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Tagalog, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian

qwen3-omni-flash-realtime-2025-12-01

Nama suara

voice parameter

Efek Timbre

Deskripsi

Bahasa yang didukung

Qianyue

Cherry

Wanita muda ceria, positif, ramah, dan alami

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Suyao

Serena

Wanita muda lembut

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Chenxu

Ethan

Mandarin standar dengan sedikit aksen utara. Cerah, hangat, energetik, dan bersemangat

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Qianxue

Chelsie

Pacar virtual dua dimensi

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Motu

Momo

Main-main dan nakal, dirancang untuk menghibur Anda

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

13

Vivian

Wanita muda galak, imut, dan sedikit mudah kesal

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Yuebai

Moon

Yue Bai spontan dan tampan

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

April

Maia

Perpaduan kecerdasan dan kelembutan

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Kai

Kai

Pengalaman menenangkan untuk telinga Anda

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Do not consume fish.

Nofish

Desainer yang tidak bisa mengucapkan bunyi retrofleks

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Mengbao

Bella

Gadis kecil yang minum tapi tidak pernah memukul saat mabuk

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Jennifer

Jennifer

Suara wanita Inggris Amerika premium berkualitas sinematik

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tiancha

Ryan

Penuh ritme, penuh ekspresi dramatis, menyeimbangkan keaslian dan intensitas

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Kajielina

Katerina

Suara wanita dewasa dengan ritme kaya yang mudah diingat

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Aideng

Aiden

Pria muda Inggris Amerika yang ahli memasak

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cangmingzi

Eldric Sage

Pria tua yang tenang dan bijaksana yang suaranya membangkitkan ketahanan pohon pinus dan kejernihan pikiran

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Good little sister

Mia

Lembut seperti air musim semi, patuh seperti salju segar

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sha Xiaomi

Mochi

Pemuda cerdas dan cerah, kekanak-kanakan namun bijaksana melebihi usianya

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Yan Zhengying

Bellona

Suara kuat dan jelas yang menghidupkan karakter dan membangkitkan kegembiraan.

Memunculkan kisah heroik dan ekspresi vokal yang hidup

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tianshu

Vincent

Suara unik serak dan berasap yang menceritakan kisah pasukan dan kesatria

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Meng Xiao Ji

Bunny

Gadis kecil yang penuh pesona "imut"

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

A Wen

Neil

Jangkar berita profesional dengan nada dasar stabil dan artikulasi tepat

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Mo Lecturer

Elias

Mempertahankan ketelitian akademis sambil mengubah pengetahuan kompleks menjadi modul kognitif yang mudah dicerna melalui bercerita

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Xu Da Ye

Arthur

Suara sederhana, terkikis waktu dan asap tembakau, perlahan membuka legenda desa dan hal aneh

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Linjia Meimei

Nini

Suara lembut dan lengket seperti kue beras manis. "Kakak" yang ditarik panjangnya begitu manis hingga melelehkan tulang Anda

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Gui Po Po

Ebona

Bisikannya seperti kunci berkarat, perlahan memutar sudut tergelap dalam diri Anda—tempat bayangan masa kecil dan ketakutan yang tidak diketahui bersembunyi

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Xiao Wan

Seren

Suara lembut dan menenangkan untuk membantu Anda tertidur lebih cepat. Selamat malam dan mimpi indah

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

mischievous child

Pip

Anak laki-laki yang main-main, nakal, namun polos—apakah dia mengingatkan Anda pada Shin-chan?

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shao Nü A Yue

Stella

Biasanya suara gadis remaja yang manis dan linglung—tetapi ketika dia berteriak “Aku mewakili bulan untuk mengalahkanmu!”, suaranya dipenuhi cinta dan keadilan yang tak terbantahkan

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bodega

Bodega

Pria Spanyol yang bersemangat

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sonisha

Sonrisa

Wanita Latin Amerika yang ceria dan terbuka

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Aleke

Alek

Suara yang dingin seperti semangat Rusia—dan hangat seperti lapisan mantel wol

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Duorche

Dolce

Pria Italia yang santai

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Suxi

Sohee

Unnie Korea yang baik hati, ceria, dan ekspresif secara emosional

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Xiao Ye Xing

Ono Anna

Teman masa kecil yang cerdas dan bersemangat

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Lai En

Lenn

Rasional di hati, memberontak dalam detail—pemuda Jerman yang mengenakan setelan dan mendengarkan post-punk

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Aimi’er’an

Emilien

Kakak laki-laki Prancis yang romantis

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Andele

Andre

Suara pria magnetis, alami, dan stabil

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ladio Ge Er

Radio Gol

Penyair sepak bola Rádio Gol! Hari ini saya akan berkomentar sepak bola menggunakan nama saya.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shanghai – A Zhen

Jada

Bibi Shanghai yang lincah dan energetik

Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Beijing – Xiao Dong

Dylan

Pemuda yang dibesarkan di hutong Beijing

Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nanjing – Lao Li

Li

Instruktur yoga yang sabar

Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shaanxi – Qin Chuan

Marcus

Wajah lebar, sedikit bicara, hati tulus, suara dalam—rasa autentik Shaanxi

Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Minnan – A Jie

Roy

Pria Taiwan yang humoris, lugas, hidup, dan apa adanya

Minnan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tianjin – Li Peter

Peter

Pemain xiangsheng Tianjin, ahli sebagai pendukung

Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan – Qing Er

Sunny

Gadis Sichuan manis yang menghangatkan hati Anda

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan – Cheng Chuan

Eric

Pria Sichuan dari Chengdu yang menonjol dari keramaian

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese – A Qiang

Rocky

A Qiang yang humoris dan cerdas, tersedia untuk obrolan online

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese – A Qing

Kiki

Gadis Hong Kong manis dan sahabat terbaik

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

qwen3-omni-flash-realtime, qwen3-omni-flash-realtime-2025-09-15

Nama suara

voice parameter

Efek Timbre

Deskripsi

Bahasa yang didukung

Qianyue

Cherry

Wanita muda ceria, positif, ramah, dan alami

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Chenxu

Ethan

Mandarin standar dengan sedikit aksen utara. Cerah, hangat, energetik, dan bersemangat

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Does not consume fish.

Nofish

Desainer yang tidak bisa mengucapkan bunyi retrofleks

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Jennifer

Jennifer

Suara wanita Inggris Amerika premium berkualitas sinematik

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tiancha

Ryan

Penuh ritme, penuh ekspresi dramatis, menyeimbangkan keaslian dan intensitas

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Kajielina

Katerina

Suara wanita dewasa dengan ritme kaya yang mudah diingat

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Mo Lecturer

Elias

Mempertahankan ketelitian akademis sambil mengubah pengetahuan kompleks menjadi modul kognitif yang mudah dicerna melalui bercerita

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shanghai – A Zhen

Jada

Bibi Shanghai yang lincah dan energetik

Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Beijing – Xiao Dong

Dylan

Pemuda yang dibesarkan di hutong Beijing

Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan – Qing Er

Sunny

Gadis Sichuan manis yang menghangatkan hati Anda

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nanjing – Lao Li

Li

Instruktur yoga yang sabar

Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shaanxi – Qin Chuan

Marcus

Wajah lebar, sedikit bicara, hati tulus, suara dalam—rasa autentik Shaanxi

Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Minnan – A Jie

Roy

Pria Taiwan yang humoris, lugas, hidup, dan apa adanya

Minnan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tianjin – Li Peter

Peter

Pemain xiangsheng Tianjin, ahli sebagai pendukung

Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese – A Qiang

Rocky

A Qiang yang humoris dan cerdas, tersedia untuk obrolan online

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese – A Qing

Kiki

Gadis Hong Kong Manis Sahabat Terbaik

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan – Cheng Chuan

Eric

Pria Sichuan dari Chengdu yang menonjol dari keramaian

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Qwen-Omni-Turbo-Realtime

Nama suara

voice parameter

Efek Timbre

Deskripsi

Bahasa yang didukung

Qianyue

Cherry

Wanita muda ceria, positif, ramah, dan alami

Chinese, English

Suyao

Serena

Wanita muda lembut

Chinese, English

Chenxu

Ethan

Mandarin standar dengan sedikit aksen utara. Cerah, hangat, energetik, dan bersemangat

Chinese, English

Qianxue

Chelsie

Pacar virtual dua dimensi

Chinese, English