All Products
Search
Document Center

Alibaba Cloud Model Studio:Real-time speech recognition - Qwen

Last Updated:Feb 15, 2026

Dalam skenario live streaming, rapat online, obrolan suara, dan asisten pintar, Anda perlu mengonversi aliran audio berkelanjutan menjadi teks secara real-time. Layanan real-time speech recognition Qwen menerima aliran audio dan mentranskripsinya dengan latensi rendah.

Fitur utama

  • Pengenalan multibahasa berakurasi tinggi: Mendukung pengenalan ucapan berakurasi tinggi untuk berbagai bahasa, termasuk Mandarin serta dialek seperti Kanton dan Sichuan. Untuk informasi selengkapnya, lihat Fitur model.

  • Adaptasi terhadap lingkungan kompleks: Mampu menangani kondisi akustik yang menantang serta mendukung deteksi bahasa otomatis dan penyaringan cerdas terhadap suara non-manusia.

  • Pengenalan emosi: Mendeteksi berbagai keadaan emosional, termasuk keterkejutan, ketenangan, kebahagiaan, kesedihan, jijik, kemarahan, dan ketakutan.

Cakupan penerapan

Model yang didukung:

Internasional

Dalam mode penyebaran internasional, titik akhir dan penyimpanan data berlokasi di Wilayah Singapura. Sumber daya komputasi inferensi model dijadwalkan secara dinamis di seluruh dunia, tidak termasuk Mainland China.

Untuk memanggil model berikut, gunakan Kunci API dari Wilayah Singapura:

Qwen3-ASR-Flash-Realtime: qwen3-asr-flash-realtime (versi stabil, saat ini setara dengan qwen3-asr-flash-realtime-2025-10-27), qwen3-asr-flash-realtime-2026-02-10 (versi snapshot terbaru), dan qwen3-asr-flash-realtime-2025-10-27 (versi snapshot)

Mainland China

Dalam mode penyebaran Mainland China, titik akhir dan penyimpanan data berlokasi di Wilayah Beijing. Sumber daya komputasi inferensi model dibatasi hanya untuk Mainland China.

Untuk memanggil model berikut, gunakan Kunci API dari Wilayah Beijing:

Qwen3-ASR-Flash-Realtime: qwen3-asr-flash-realtime (versi stabil, saat ini setara dengan qwen3-asr-flash-realtime-2025-10-27), qwen3-asr-flash-realtime-2026-02-10 (versi snapshot terbaru), qwen3-asr-flash-realtime-2025-10-27 (versi snapshot)

Untuk informasi selengkapnya, lihat Daftar model.

Pemilihan model

Skenario

Model yang direkomendasikan

Alasan

Inspeksi kualitas cerdas untuk layanan pelanggan

qwen3-asr-flash-realtime-2026-02-10

Menganalisis konten panggilan dan emosi pelanggan secara real-time untuk membantu agen dan memantau kualitas layanan.

Live streaming/Video pendek

Menghasilkan takarir real-time untuk konten langsung agar menjangkau audiens multibahasa.

Rapat online/Wawancara

Mencatat ucapan rapat secara real-time dan menghasilkan ringkasan teks dengan cepat untuk meningkatkan efisiensi pengorganisasian informasi.

Untuk informasi selengkapnya, lihat Fitur model.

Memulai

Gunakan SDK DashScope

Java

  1. Instal SDK. Pastikan versi SDK DashScope adalah 2.22.5 atau lebih baru.

  2. Dapatkan Kunci API. Tetapkan Kunci API sebagai variabel lingkungan untuk menghindari hardcoding dalam kode Anda.

  3. Jalankan kode contoh.

    import com.alibaba.dashscope.audio.omni.*;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.google.gson.JsonObject;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import javax.sound.sampled.LineUnavailableException;
    import java.io.File;
    import java.io.FileInputStream;
    import java.util.Base64;
    import java.util.Collections;
    import java.util.concurrent.CountDownLatch;
    import java.util.concurrent.atomic.AtomicReference;
    
    public class Qwen3AsrRealtimeUsage {
        private static final Logger log = LoggerFactory.getLogger(Qwen3AsrRealtimeUsage.class);
        private static final int AUDIO_CHUNK_SIZE = 1024; // Ukuran chunk audio dalam byte
        private static final int SLEEP_INTERVAL_MS = 30;  // Interval tidur dalam milidetik
    
        public static void main(String[] args) throws InterruptedException, LineUnavailableException {
            CountDownLatch finishLatch = new CountDownLatch(1);
    
            OmniRealtimeParam param = OmniRealtimeParam.builder()
                    .model("qwen3-asr-flash-realtime")
                    // URL berikut untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    // Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key.
                    // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: .apikey("sk-xxx")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    .build();
    
            OmniRealtimeConversation conversation = null;
            final AtomicReference<OmniRealtimeConversation> conversationRef = new AtomicReference<>(null);
            conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
                @Override
                public void onOpen() {
                    System.out.println("connection opened");
                }
                @Override
                public void onEvent(JsonObject message) {
                    String type = message.get("type").getAsString();
                    switch(type) {
                        case "session.created":
                            System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
                            break;
                        case "conversation.item.input_audio_transcription.completed":
                            System.out.println("transcription: " + message.get("transcript").getAsString());
                            finishLatch.countDown();
                            break;
                        case "input_audio_buffer.speech_started":
                            System.out.println("======VAD Speech Start======");
                            break;
                        case "input_audio_buffer.speech_stopped":
                            System.out.println("======VAD Speech Stop======");
                            break;
                        case "conversation.item.input_audio_transcription.text":
                            System.out.println("transcription: " + message.get("text").getAsString());
                            break;
                        default:
                            break;
                    }
                }
                @Override
                public void onClose(int code, String reason) {
                    System.out.println("connection closed code: " + code + ", reason: " + reason);
                }
            });
            conversationRef.set(conversation);
            try {
                conversation.connect();
            } catch (NoApiKeyException e) {
                throw new RuntimeException(e);
            }
    
            OmniRealtimeTranscriptionParam transcriptionParam = new OmniRealtimeTranscriptionParam();
            transcriptionParam.setLanguage("zh");
            transcriptionParam.setInputAudioFormat("pcm");
            transcriptionParam.setInputSampleRate(16000);
    
            OmniRealtimeConfig config = OmniRealtimeConfig.builder()
                    .modalities(Collections.singletonList(OmniRealtimeModality.TEXT))
                    .transcriptionConfig(transcriptionParam)
                    .build();
            conversation.updateSession(config);
    
            String filePath = "your_audio_file.pcm";
            File audioFile = new File(filePath);
            if (!audioFile.exists()) {
                log.error("Audio file not found: {}", filePath);
                return;
            }
    
            try (FileInputStream audioInputStream = new FileInputStream(audioFile)) {
                byte[] audioBuffer = new byte[AUDIO_CHUNK_SIZE];
                int bytesRead;
                int totalBytesRead = 0;
    
                log.info("Starting to send audio data from: {}", filePath);
    
                // Baca dan kirim data audio dalam potongan
                while ((bytesRead = audioInputStream.read(audioBuffer)) != -1) {
                    totalBytesRead += bytesRead;
                    String audioB64 = Base64.getEncoder().encodeToString(audioBuffer);
                    // Kirim potongan audio ke percakapan
                    conversation.appendAudio(audioB64);
    
                    // Tambahkan jeda kecil untuk mensimulasikan streaming audio real-time
                    Thread.sleep(SLEEP_INTERVAL_MS);
                }
    
                log.info("Finished sending audio data. Total bytes sent: {}", totalBytesRead);
    
            } catch (Exception e) {
                log.error("Error sending audio from file: {}", filePath, e);
            }
    
            // Kirim session.finish, tunggu sesi selesai, lalu tutup koneksi.
            conversation.endSession();
            log.info("Task finished");
    
            System.exit(0);
        }
    }

Python

  1. Instal SDK. Pastikan versi SDK DashScope adalah 1.25.6 atau lebih baru.

  2. Dapatkan Kunci API. Tetapkan Kunci API sebagai variabel lingkungan untuk menghindari hardcoding dalam kode Anda.

  3. Jalankan kode contoh.

    import logging
    import os
    import base64
    import signal
    import sys
    import time
    import dashscope
    from dashscope.audio.qwen_omni import *
    from dashscope.audio.qwen_omni.omni_realtime import TranscriptionParams
    
    
    def setup_logging():
        """Konfigurasi logging."""
        logger = logging.getLogger('dashscope')
        logger.setLevel(logging.DEBUG)
        handler = logging.StreamHandler(sys.stdout)
        handler.setLevel(logging.DEBUG)
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        logger.propagate = False
        return logger
    
    
    def init_api_key():
        """Inisialisasi Kunci API."""
        # Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key.
        # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: dashscope.api_key = "sk-xxx"
        dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY', 'YOUR_API_KEY')
        if dashscope.api_key == 'YOUR_API_KEY':
            print('[Warning] Using placeholder API key, set DASHSCOPE_API_KEY environment variable.')
    
    
    class MyCallback(OmniRealtimeCallback):
        """Menangani callback pengenalan real-time."""
        def __init__(self, conversation):
            self.conversation = conversation
            self.handlers = {
                'session.created': self._handle_session_created,
                'conversation.item.input_audio_transcription.completed': self._handle_final_text,
                'conversation.item.input_audio_transcription.text': self._handle_stash_text,
                'input_audio_buffer.speech_started': lambda r: print('======Speech Start======'),
                'input_audio_buffer.speech_stopped': lambda r: print('======Speech Stop======')
            }
    
        def on_open(self):
            print('Connection opened')
    
        def on_close(self, code, msg):
            print(f'Connection closed, code: {code}, msg: {msg}')
    
        def on_event(self, response):
            try:
                handler = self.handlers.get(response['type'])
                if handler:
                    handler(response)
            except Exception as e:
                print(f'[Error] {e}')
    
        def _handle_session_created(self, response):
            print(f"Start session: {response['session']['id']}")
    
        def _handle_final_text(self, response):
            print(f"Final recognized text: {response['transcript']}")
    
        def _handle_stash_text(self, response):
            print(f"Got stash result: {response['stash']}")
    
    
    def read_audio_chunks(file_path, chunk_size=3200):
        """Membaca file audio dalam potongan."""
        with open(file_path, 'rb') as f:
            while chunk := f.read(chunk_size):
                yield chunk
    
    
    def send_audio(conversation, file_path, delay=0.1):
        """Mengirim data audio."""
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"Audio file {file_path} does not exist.")
    
        print("Processing audio file... Press 'Ctrl+C' to stop.")
        for chunk in read_audio_chunks(file_path):
            audio_b64 = base64.b64encode(chunk).decode('ascii')
            conversation.append_audio(audio_b64)
            time.sleep(delay)
    
    def main():
        setup_logging()
        init_api_key()
    
        audio_file_path = "./your_audio_file.pcm"
        conversation = OmniRealtimeConversation(
            model='qwen3-asr-flash-realtime',
            # URL berikut untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
            url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime',
            callback=MyCallback(conversation=None)  # Sementara berikan None dan injeksikan nanti.
        )
    
        # Injeksikan self ke dalam callback.
        conversation.callback.conversation = conversation
    
        def handle_exit(sig, frame):
            print('Ctrl+C pressed, exiting...')
            conversation.close()
            sys.exit(0)
    
        signal.signal(signal.SIGINT, handle_exit)
    
        conversation.connect()
    
        transcription_params = TranscriptionParams(
            language='zh',
            sample_rate=16000,
            input_audio_format="pcm"
        )
    
        conversation.update_session(
            output_modalities=[MultiModality.TEXT],
            enable_input_audio_transcription=True,
            transcription_params=transcription_params
        )
    
        try:
            send_audio(conversation, audio_file_path)
            # Kirim session.finish, tunggu sesi selesai, lalu tutup koneksi.
            conversation.end_session()
        except Exception as e:
            print(f"Error occurred: {e}")
        finally:
            conversation.close()
            print("Audio processing completed.")
    
    
    if __name__ == '__main__':
        main()

Gunakan API WebSocket

Contoh berikut menunjukkan cara mengirim file audio lokal dan mengambil hasil pengenalan melalui koneksi WebSocket.

  1. Dapatkan Kunci API: Dapatkan Kunci API. Untuk keamanan, tetapkan Kunci API sebagai variabel lingkungan.

  2. Tulis dan jalankan kode: Implementasikan alur lengkap autentikasi, koneksi, pengiriman audio, dan penerimaan hasil. Untuk informasi selengkapnya, lihat Alur interaksi.

    Python

    Sebelum menjalankan contoh, instal dependensi dengan menjalankan perintah berikut:

    pip uninstall websocket-client
    pip uninstall websocket
    pip install websocket-client

    Jangan beri nama file kode contoh dengan websocket.py. Jika tidak, error berikut dapat terjadi: AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?

    # pip install websocket-client
    import os
    import time
    import json
    import threading
    import base64
    import websocket
    import logging
    import logging.handlers
    from datetime import datetime
    
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    
    # Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key.
    # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: API_KEY="sk-xxx"
    API_KEY = os.environ.get("DASHSCOPE_API_KEY", "sk-xxx")
    QWEN_MODEL = "qwen3-asr-flash-realtime"
    # Berikut adalah URL dasar untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dasar dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
    baseUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
    url = f"{baseUrl}?model={QWEN_MODEL}"
    print(f"Connecting to server: {url}")
    
    # Catatan: Jika Anda tidak berada dalam mode VAD, durasi kumulatif audio yang dikirim terus-menerus tidak boleh melebihi 60 detik.
    enableServerVad = True
    is_running = True  # Tambahkan flag running.
    
    headers = [
        "Authorization: Bearer " + API_KEY,
        "OpenAI-Beta: realtime=v1"
    ]
    
    def init_logger():
        formatter = logging.Formatter('%(asctime)s|%(levelname)s|%(message)s')
        f_handler = logging.handlers.RotatingFileHandler(
            "omni_tester.log", maxBytes=100 * 1024 * 1024, backupCount=3
        )
        f_handler.setLevel(logging.DEBUG)
        f_handler.setFormatter(formatter)
    
        console = logging.StreamHandler()
        console.setLevel(logging.DEBUG)
        console.setFormatter(formatter)
    
        logger.addHandler(f_handler)
        logger.addHandler(console)
    
    def on_open(ws):
        logger.info("Connected to server.")
    
        # Event pembaruan sesi.
        event_manual = {
            "event_id": "event_123",
            "type": "session.update",
            "session": {
                "modalities": ["text"],
                "input_audio_format": "pcm",
                "sample_rate": 16000,
                "input_audio_transcription": {
                    # Pengidentifikasi bahasa, opsional. Jika Anda memiliki informasi bahasa yang jelas, atur di sini.
                    "language": "zh"
                },
                "turn_detection": None
            }
        }
        event_vad = {
            "event_id": "event_123",
            "type": "session.update",
            "session": {
                "modalities": ["text"],
                "input_audio_format": "pcm",
                "sample_rate": 16000,
                "input_audio_transcription": {
                    "language": "zh"
                },
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.0,
                    "silence_duration_ms": 400
                }
            }
        }
        if enableServerVad:
            logger.info(f"Sending event: {json.dumps(event_vad, indent=2)}")
            ws.send(json.dumps(event_vad))
        else:
            logger.info(f"Sending event: {json.dumps(event_manual, indent=2)}")
            ws.send(json.dumps(event_manual))
    
    def on_message(ws, message):
        global is_running
        try:
            data = json.loads(message)
            logger.info(f"Received event: {json.dumps(data, ensure_ascii=False, indent=2)}")
            if data.get("type") == "session.finished":
                logger.info(f"Final transcript: {data.get('transcript')}")
                logger.info("Closing WebSocket connection after session finished...")
                is_running = False  # Hentikan thread pengiriman audio.
                ws.close()
        except json.JSONDecodeError:
            logger.error(f"Failed to parse message: {message}")
    
    def on_error(ws, error):
        logger.error(f"Error: {error}")
    
    def on_close(ws, close_status_code, close_msg):
        logger.info(f"Connection closed: {close_status_code} - {close_msg}")
    
    def send_audio(ws, local_audio_path):
        time.sleep(3)  # Tunggu pembaruan sesi selesai.
        global is_running
    
        with open(local_audio_path, 'rb') as audio_file:
            logger.info(f"Start reading the file: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
            while is_running:
                audio_data = audio_file.read(3200)  # ~0.1 detik audio PCM16/16 kHz.
                if not audio_data:
                    logger.info(f"Finished reading the file: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
                    if ws.sock and ws.sock.connected:
                        if not enableServerVad:
                            commit_event = {
                                "event_id": "event_789",
                                "type": "input_audio_buffer.commit"
                            }
                            ws.send(json.dumps(commit_event))
                        finish_event = {
                            "event_id": "event_987",
                            "type": "session.finish"
                        }
                        ws.send(json.dumps(finish_event))
                    break
    
                if not ws.sock or not ws.sock.connected:
                    logger.info("The WebSocket is closed. Stop sending audio.")
                    break
    
                encoded_data = base64.b64encode(audio_data).decode('utf-8')
                eventd = {
                    "event_id": f"event_{int(time.time() * 1000)}",
                    "type": "input_audio_buffer.append",
                    "audio": encoded_data
                }
                ws.send(json.dumps(eventd))
                logger.info(f"Sending audio event: {eventd['event_id']}")
                time.sleep(0.1)  # Simulasikan pengumpulan real-time.
    
    # Inisialisasi logger.
    init_logger()
    logger.info(f"Connecting to WebSocket server at {url}...")
    
    local_audio_path = "your_audio_file.pcm"
    ws = websocket.WebSocketApp(
        url,
        header=headers,
        on_open=on_open,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close
    )
    
    thread = threading.Thread(target=send_audio, args=(ws, local_audio_path))
    thread.start()
    ws.run_forever()

    Java

    Sebelum menjalankan contoh, instal dependensi Java-WebSocket:

    Maven

    <dependency>
        <groupId>org.java-websocket</groupId>
        <artifactId>Java-WebSocket</artifactId>
        <version>1.5.6</version>
    </dependency>

    Gradle

    implementation 'org.java-websocket:Java-WebSocket:1.5.6'
    import org.java_websocket.client.WebSocketClient;
    import org.java_websocket.handshake.ServerHandshake;
    import org.json.JSONObject;
    
    import java.net.URI;
    import java.nio.file.Files;
    import java.nio.file.Paths;
    import java.util.Base64;
    import java.util.concurrent.atomic.AtomicBoolean;
    import java.util.logging.*;
    
    public class QwenASRRealtimeClient {
    
        private static final Logger logger = Logger.getLogger(QwenASRRealtimeClient.class.getName());
        // Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key.
        // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: private static final String API_KEY = "sk-xxx"
        private static final String API_KEY = System.getenv().getOrDefault("DASHSCOPE_API_KEY", "sk-xxx");
        private static final String MODEL = "qwen3-asr-flash-realtime";
    
        // Mengontrol apakah akan menggunakan mode VAD.
        private static final boolean enableServerVad = true;
    
        private static final AtomicBoolean isRunning = new AtomicBoolean(true);
        private static WebSocketClient client;
    
        public static void main(String[] args) throws Exception {
            initLogger();
    
            // Berikut adalah URL dasar untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dasar dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
            String baseUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime";
            String url = baseUrl + "?model=" + MODEL;
            logger.info("Connecting to server: " + url);
    
            client = new WebSocketClient(new URI(url)) {
                @Override
                public void onOpen(ServerHandshake handshake) {
                    logger.info("Connected to server.");
                    sendSessionUpdate();
                }
    
                @Override
                public void onMessage(String message) {
                    try {
                        JSONObject data = new JSONObject(message);
                        String eventType = data.optString("type");
    
                        logger.info("Received event: " + data.toString(2));
    
                        // Saat event selesai diterima, hentikan thread pengiriman dan tutup koneksi.
                        if ("session.finished".equals(eventType)) {
                            logger.info("Final transcript: " + data.optString("transcript"));
                            logger.info("Closing WebSocket connection after session finished...");
    
                            isRunning.set(false); // Hentikan thread pengiriman audio.
                            if (this.isOpen()) {
                                this.close(1000, "ASR finished");
                            }
                        }
                    } catch (Exception e) {
                        logger.severe("Failed to parse message: " + message);
                    }
                }
    
                @Override
                public void onClose(int code, String reason, boolean remote) {
                    logger.info("Connection closed: " + code + " - " + reason);
                }
    
                @Override
                public void onError(Exception ex) {
                    logger.severe("Error: " + ex.getMessage());
                }
            };
    
            // Tambahkan header permintaan.
            client.addHeader("Authorization", "Bearer " + API_KEY);
            client.addHeader("OpenAI-Beta", "realtime=v1");
    
            client.connectBlocking(); // Blokir hingga koneksi terbentuk.
    
            // Ganti dengan path file audio yang akan dikenali.
            String localAudioPath = "your_audio_file.pcm";
            Thread audioThread = new Thread(() -> {
                try {
                    sendAudio(localAudioPath);
                } catch (Exception e) {
                    logger.severe("Audio sending thread error: " + e.getMessage());
                }
            });
            audioThread.start();
        }
    
        /** Event pembaruan sesi (aktifkan/nonaktifkan VAD). */
        private static void sendSessionUpdate() {
            JSONObject eventNoVad = new JSONObject()
                    .put("event_id", "event_123")
                    .put("type", "session.update")
                    .put("session", new JSONObject()
                            .put("modalities", new String[]{"text"})
                            .put("input_audio_format", "pcm")
                            .put("sample_rate", 16000)
                            .put("input_audio_transcription", new JSONObject()
                                    .put("language", "zh"))
                            .put("turn_detection", JSONObject.NULL) // Mode manual.
                    );
    
            JSONObject eventVad = new JSONObject()
                    .put("event_id", "event_123")
                    .put("type", "session.update")
                    .put("session", new JSONObject()
                            .put("modalities", new String[]{"text"})
                            .put("input_audio_format", "pcm")
                            .put("sample_rate", 16000)
                            .put("input_audio_transcription", new JSONObject()
                                    .put("language", "zh"))
                            .put("turn_detection", new JSONObject()
                                    .put("type", "server_vad")
                                    .put("threshold", 0.0)
                                    .put("silence_duration_ms", 400))
                    );
    
            if (enableServerVad) {
                logger.info("Sending event (VAD):\n" + eventVad.toString(2));
                client.send(eventVad.toString());
            } else {
                logger.info("Sending event (Manual):\n" + eventNoVad.toString(2));
                client.send(eventNoVad.toString());
            }
        }
    
        /** Kirim aliran file audio. */
        private static void sendAudio(String localAudioPath) throws Exception {
            Thread.sleep(3000); // Tunggu sesi siap.
            byte[] allBytes = Files.readAllBytes(Paths.get(localAudioPath));
            logger.info("Start reading the file.");
    
            int offset = 0;
            while (isRunning.get() && offset < allBytes.length) {
                int chunkSize = Math.min(3200, allBytes.length - offset);
                byte[] chunk = new byte[chunkSize];
                System.arraycopy(allBytes, offset, chunk, 0, chunkSize);
                offset += chunkSize;
    
                if (client != null && client.isOpen()) {
                    String encoded = Base64.getEncoder().encodeToString(chunk);
                    JSONObject eventd = new JSONObject()
                            .put("event_id", "event_" + System.currentTimeMillis())
                            .put("type", "input_audio_buffer.append")
                            .put("audio", encoded);
    
                    client.send(eventd.toString());
                    logger.info("Sending audio event: " + eventd.getString("event_id"));
                } else {
                    break; // Hindari pengiriman setelah putus.
                }
    
                Thread.sleep(100); // Simulasikan pengiriman real-time.
            }
    
            logger.info("Finished reading the file.");
    
            if (client != null && client.isOpen()) {
                // Commit diperlukan dalam mode non-VAD.
                if (!enableServerVad) {
                    JSONObject commitEvent = new JSONObject()
                            .put("event_id", "event_789")
                            .put("type", "input_audio_buffer.commit");
                    client.send(commitEvent.toString());
                    logger.info("Sent commit event for manual mode.");
                }
    
                JSONObject finishEvent = new JSONObject()
                        .put("event_id", "event_987")
                        .put("type", "session.finish");
                client.send(finishEvent.toString());
                logger.info("Sent finish event.");
            }
        }
    
        /** Inisialisasi logger. */
        private static void initLogger() {
            logger.setLevel(Level.ALL);
            Logger rootLogger = Logger.getLogger("");
            for (Handler h : rootLogger.getHandlers()) {
                rootLogger.removeHandler(h);
            }
    
            Handler consoleHandler = new ConsoleHandler();
            consoleHandler.setLevel(Level.ALL);
            consoleHandler.setFormatter(new SimpleFormatter());
            logger.addHandler(consoleHandler);
        }
    }

    Node.js

    Sebelum menjalankan contoh, instal dependensi dengan menjalankan perintah berikut:

    npm install ws
    /**
     * Qwen-ASR Realtime WebSocket Client (versi Node.js)
     * Fitur:
     * - Mendukung mode VAD dan Manual.
     * - Mengirim session.update untuk memulai sesi.
     * - Terus-menerus mengirim potongan audio input_audio_buffer.append.
     * - Mengirim input_audio_buffer.commit dalam mode Manual.
     * - Mengirim event session.finish.
     * - Menutup koneksi setelah menerima event session.finished.
     */
    
    import WebSocket from 'ws';
    import fs from 'fs';
    
    // ===== Konfigurasi =====
    // Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key.
    // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: const API_KEY = "sk-xxx"
    const API_KEY = process.env.DASHSCOPE_API_KEY || 'sk-xxx';
    const MODEL = 'qwen3-asr-flash-realtime';
    const enableServerVad = true; // true untuk mode VAD, false untuk mode Manual
    const localAudioPath = 'your_audio_file.pcm'; // Path ke file audio PCM16, 16 kHz
    
    // Berikut adalah URL dasar untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dasar dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
    const baseUrl = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime';
    const url = `${baseUrl}?model=${MODEL}`;
    
    console.log(`Connecting to server: ${url}`);
    
    // ===== Kontrol Status =====
    let isRunning = true;
    
    // ===== Buat Koneksi =====
    const ws = new WebSocket(url, {
        headers: {
            'Authorization': `Bearer ${API_KEY}`,
            'OpenAI-Beta': 'realtime=v1'
        }
    });
    
    // ===== Binding Event =====
    ws.on('open', () => {
        console.log('[WebSocket] Connected to server.');
        sendSessionUpdate();
        // Mulai thread pengiriman audio.
        sendAudio(localAudioPath);
    });
    
    ws.on('message', (message) => {
        try {
            const data = JSON.parse(message);
            console.log('[Received Event]:', JSON.stringify(data, null, 2));
    
            // Menerima event selesai.
            if (data.type === 'session.finished') {
                console.log(`[Final Transcript] ${data.transcript}`);
                console.log('[Action] Closing WebSocket connection after session finished...');
                
                if (ws.readyState === WebSocket.OPEN) {
                    ws.close(1000, 'ASR finished');
                }
            }
        } catch (e) {
            console.error('[Error] Failed to parse message:', message);
        }
    });
    
    ws.on('close', (code, reason) => {
        console.log(`[WebSocket] Connection closed: ${code} - ${reason}`);
    });
    
    ws.on('error', (err) => {
        console.error('[WebSocket Error]', err);
    });
    
    // ===== Pembaruan Sesi =====
    function sendSessionUpdate() {
        const eventNoVad = {
            event_id: 'event_123',
            type: 'session.update',
            session: {
                modalities: ['text'],
                input_audio_format: 'pcm',
                sample_rate: 16000,
                input_audio_transcription: {
                    language: 'zh'
                },
                turn_detection: null
            }
        };
    
        const eventVad = {
            event_id: 'event_123',
            type: 'session.update',
            session: {
                modalities: ['text'],
                input_audio_format: 'pcm',
                sample_rate: 16000,
                input_audio_transcription: {
                    language: 'zh'
                },
                turn_detection: {
                    type: 'server_vad',
                    threshold: 0.0,
                    silence_duration_ms: 400
                }
            }
        };
    
        if (enableServerVad) {
            console.log('[Send Event] VAD Mode:\n', JSON.stringify(eventVad, null, 2));
            ws.send(JSON.stringify(eventVad));
        } else {
            console.log('[Send Event] Manual Mode:\n', JSON.stringify(eventNoVad, null, 2));
            ws.send(JSON.stringify(eventNoVad));
        }
    }
    
    // ===== Kirim Aliran File Audio =====
    function sendAudio(audioPath) {
        setTimeout(() => {
            console.log(`[File Read Start] ${audioPath}`);
            const buffer = fs.readFileSync(audioPath);
    
            let offset = 0;
            const chunkSize = 3200; // Sekitar 0.1 detik audio PCM16
    
            function sendChunk() {
                if (!isRunning) return;
                if (offset >= buffer.length) {
                    isRunning = false; // Hentikan pengiriman audio.
                    console.log('[File Read End]');
                    if (ws.readyState === WebSocket.OPEN) {
                        if (!enableServerVad) {
                            const commitEvent = {
                                event_id: 'event_789',
                                type: 'input_audio_buffer.commit'
                            };
                            ws.send(JSON.stringify(commitEvent));
                            console.log('[Send Commit Event]');
                        }
    
                        const finishEvent = {
                            event_id: 'event_987',
                            type: 'session.finish'
                        };
                        ws.send(JSON.stringify(finishEvent));
                        console.log('[Send Finish Event]');
                    }
                    
                    return;
                }
    
                if (ws.readyState !== WebSocket.OPEN) {
                    console.log('[Stop] WebSocket is not open.');
                    return;
                }
    
                const chunk = buffer.slice(offset, offset + chunkSize);
                offset += chunkSize;
    
                const encoded = chunk.toString('base64');
                const appendEvent = {
                    event_id: `event_${Date.now()}`,
                    type: 'input_audio_buffer.append',
                    audio: encoded
                };
    
                ws.send(JSON.stringify(appendEvent));
                console.log(`[Send Audio Event] ${appendEvent.event_id}`);
    
                setTimeout(sendChunk, 100); // Simulasikan pengiriman real-time.
            }
    
            sendChunk();
        }, 3000); // Tunggu konfigurasi sesi selesai.
    }

Referensi API

Real-time speech recognition – Referensi API Qwen

Fitur model

Fitur

qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2026-02-10, qwen3-asr-flash-realtime-2025-10-27

Bahasa yang didukung

Bahasa Tionghoa (Mandarin, Sichuan, Minnan, Wu, dan Kanton), Inggris, Jepang, Jerman, Korea, Rusia, Prancis, Portugis, Arab, Italia, Spanyol, Hindi, Indonesia, Thailand, Turki, Ukraina, Vietnam, Ceko, Denmark, Filipina, Finlandia, Islandia, Melayu, Norwegia, Polandia, dan Swedia

Format audio yang didukung

pcm, opus

Sample rate

8 kHz, 16 kHz

Channel

Mono

Format input

Aliran audio biner

Ukuran/durasi audio

Tanpa Batas

Pengenalan emosi

Didukung Selalu aktif

Penyaringan kata sensitif

Tidak didukung

Speaker diarization

Tidak didukung

Penyaringan kata pengisi

Tidak didukung

Timestamp

Tidak didukung

Prediksi tanda baca

Didukung Selalu aktif

Inverse Text Normalization (ITN)

Tidak didukung

Voice Activity Detection (VAD)

Didukung Selalu aktif

Batas laju (RPS)

20

Jenis koneksi

SDK Java/Python, API WebSocket

Harga

Internasional: $0,00009/detik

Mainland China: $0,000047/detik