Dalam skenario live streaming, rapat online, obrolan suara, dan asisten pintar, Anda perlu mengonversi aliran audio berkelanjutan menjadi teks secara real-time. Layanan real-time speech recognition Qwen menerima aliran audio dan mentranskripsinya dengan latensi rendah.
Fitur utama
Pengenalan multibahasa berakurasi tinggi: Mendukung pengenalan ucapan berakurasi tinggi untuk berbagai bahasa, termasuk Mandarin serta dialek seperti Kanton dan Sichuan. Untuk informasi selengkapnya, lihat Fitur model.
Adaptasi terhadap lingkungan kompleks: Mampu menangani kondisi akustik yang menantang serta mendukung deteksi bahasa otomatis dan penyaringan cerdas terhadap suara non-manusia.
Pengenalan emosi: Mendeteksi berbagai keadaan emosional, termasuk keterkejutan, ketenangan, kebahagiaan, kesedihan, jijik, kemarahan, dan ketakutan.
Cakupan penerapan
Model yang didukung:
Internasional
Dalam mode penyebaran internasional, titik akhir dan penyimpanan data berlokasi di Wilayah Singapura. Sumber daya komputasi inferensi model dijadwalkan secara dinamis di seluruh dunia, tidak termasuk Mainland China.
Untuk memanggil model berikut, gunakan Kunci API dari Wilayah Singapura:
Qwen3-ASR-Flash-Realtime: qwen3-asr-flash-realtime (versi stabil, saat ini setara dengan qwen3-asr-flash-realtime-2025-10-27), qwen3-asr-flash-realtime-2026-02-10 (versi snapshot terbaru), dan qwen3-asr-flash-realtime-2025-10-27 (versi snapshot)
Mainland China
Dalam mode penyebaran Mainland China, titik akhir dan penyimpanan data berlokasi di Wilayah Beijing. Sumber daya komputasi inferensi model dibatasi hanya untuk Mainland China.
Untuk memanggil model berikut, gunakan Kunci API dari Wilayah Beijing:
Qwen3-ASR-Flash-Realtime: qwen3-asr-flash-realtime (versi stabil, saat ini setara dengan qwen3-asr-flash-realtime-2025-10-27), qwen3-asr-flash-realtime-2026-02-10 (versi snapshot terbaru), qwen3-asr-flash-realtime-2025-10-27 (versi snapshot)
Untuk informasi selengkapnya, lihat Daftar model.
Pemilihan model
Skenario | Model yang direkomendasikan | Alasan |
Inspeksi kualitas cerdas untuk layanan pelanggan | qwen3-asr-flash-realtime-2026-02-10 | Menganalisis konten panggilan dan emosi pelanggan secara real-time untuk membantu agen dan memantau kualitas layanan. |
Live streaming/Video pendek | Menghasilkan takarir real-time untuk konten langsung agar menjangkau audiens multibahasa. | |
Rapat online/Wawancara | Mencatat ucapan rapat secara real-time dan menghasilkan ringkasan teks dengan cepat untuk meningkatkan efisiensi pengorganisasian informasi. |
Untuk informasi selengkapnya, lihat Fitur model.
Memulai
Gunakan SDK DashScope
Java
Instal SDK. Pastikan versi SDK DashScope adalah 2.22.5 atau lebih baru.
Dapatkan Kunci API. Tetapkan Kunci API sebagai variabel lingkungan untuk menghindari hardcoding dalam kode Anda.
Jalankan kode contoh.
import com.alibaba.dashscope.audio.omni.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import javax.sound.sampled.LineUnavailableException; import java.io.File; import java.io.FileInputStream; import java.util.Base64; import java.util.Collections; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicReference; public class Qwen3AsrRealtimeUsage { private static final Logger log = LoggerFactory.getLogger(Qwen3AsrRealtimeUsage.class); private static final int AUDIO_CHUNK_SIZE = 1024; // Ukuran chunk audio dalam byte private static final int SLEEP_INTERVAL_MS = 30; // Interval tidur dalam milidetik public static void main(String[] args) throws InterruptedException, LineUnavailableException { CountDownLatch finishLatch = new CountDownLatch(1); OmniRealtimeParam param = OmniRealtimeParam.builder() .model("qwen3-asr-flash-realtime") // URL berikut untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime. .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") // Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key. // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: .apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); OmniRealtimeConversation conversation = null; final AtomicReference<OmniRealtimeConversation> conversationRef = new AtomicReference<>(null); conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() { @Override public void onOpen() { System.out.println("connection opened"); } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString()); break; case "conversation.item.input_audio_transcription.completed": System.out.println("transcription: " + message.get("transcript").getAsString()); finishLatch.countDown(); break; case "input_audio_buffer.speech_started": System.out.println("======VAD Speech Start======"); break; case "input_audio_buffer.speech_stopped": System.out.println("======VAD Speech Stop======"); break; case "conversation.item.input_audio_transcription.text": System.out.println("transcription: " + message.get("text").getAsString()); break; default: break; } } @Override public void onClose(int code, String reason) { System.out.println("connection closed code: " + code + ", reason: " + reason); } }); conversationRef.set(conversation); try { conversation.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } OmniRealtimeTranscriptionParam transcriptionParam = new OmniRealtimeTranscriptionParam(); transcriptionParam.setLanguage("zh"); transcriptionParam.setInputAudioFormat("pcm"); transcriptionParam.setInputSampleRate(16000); OmniRealtimeConfig config = OmniRealtimeConfig.builder() .modalities(Collections.singletonList(OmniRealtimeModality.TEXT)) .transcriptionConfig(transcriptionParam) .build(); conversation.updateSession(config); String filePath = "your_audio_file.pcm"; File audioFile = new File(filePath); if (!audioFile.exists()) { log.error("Audio file not found: {}", filePath); return; } try (FileInputStream audioInputStream = new FileInputStream(audioFile)) { byte[] audioBuffer = new byte[AUDIO_CHUNK_SIZE]; int bytesRead; int totalBytesRead = 0; log.info("Starting to send audio data from: {}", filePath); // Baca dan kirim data audio dalam potongan while ((bytesRead = audioInputStream.read(audioBuffer)) != -1) { totalBytesRead += bytesRead; String audioB64 = Base64.getEncoder().encodeToString(audioBuffer); // Kirim potongan audio ke percakapan conversation.appendAudio(audioB64); // Tambahkan jeda kecil untuk mensimulasikan streaming audio real-time Thread.sleep(SLEEP_INTERVAL_MS); } log.info("Finished sending audio data. Total bytes sent: {}", totalBytesRead); } catch (Exception e) { log.error("Error sending audio from file: {}", filePath, e); } // Kirim session.finish, tunggu sesi selesai, lalu tutup koneksi. conversation.endSession(); log.info("Task finished"); System.exit(0); } }
Python
Instal SDK. Pastikan versi SDK DashScope adalah 1.25.6 atau lebih baru.
Dapatkan Kunci API. Tetapkan Kunci API sebagai variabel lingkungan untuk menghindari hardcoding dalam kode Anda.
Jalankan kode contoh.
import logging import os import base64 import signal import sys import time import dashscope from dashscope.audio.qwen_omni import * from dashscope.audio.qwen_omni.omni_realtime import TranscriptionParams def setup_logging(): """Konfigurasi logging.""" logger = logging.getLogger('dashscope') logger.setLevel(logging.DEBUG) handler = logging.StreamHandler(sys.stdout) handler.setLevel(logging.DEBUG) formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) logger.addHandler(handler) logger.propagate = False return logger def init_api_key(): """Inisialisasi Kunci API.""" # Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key. # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: dashscope.api_key = "sk-xxx" dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY', 'YOUR_API_KEY') if dashscope.api_key == 'YOUR_API_KEY': print('[Warning] Using placeholder API key, set DASHSCOPE_API_KEY environment variable.') class MyCallback(OmniRealtimeCallback): """Menangani callback pengenalan real-time.""" def __init__(self, conversation): self.conversation = conversation self.handlers = { 'session.created': self._handle_session_created, 'conversation.item.input_audio_transcription.completed': self._handle_final_text, 'conversation.item.input_audio_transcription.text': self._handle_stash_text, 'input_audio_buffer.speech_started': lambda r: print('======Speech Start======'), 'input_audio_buffer.speech_stopped': lambda r: print('======Speech Stop======') } def on_open(self): print('Connection opened') def on_close(self, code, msg): print(f'Connection closed, code: {code}, msg: {msg}') def on_event(self, response): try: handler = self.handlers.get(response['type']) if handler: handler(response) except Exception as e: print(f'[Error] {e}') def _handle_session_created(self, response): print(f"Start session: {response['session']['id']}") def _handle_final_text(self, response): print(f"Final recognized text: {response['transcript']}") def _handle_stash_text(self, response): print(f"Got stash result: {response['stash']}") def read_audio_chunks(file_path, chunk_size=3200): """Membaca file audio dalam potongan.""" with open(file_path, 'rb') as f: while chunk := f.read(chunk_size): yield chunk def send_audio(conversation, file_path, delay=0.1): """Mengirim data audio.""" if not os.path.exists(file_path): raise FileNotFoundError(f"Audio file {file_path} does not exist.") print("Processing audio file... Press 'Ctrl+C' to stop.") for chunk in read_audio_chunks(file_path): audio_b64 = base64.b64encode(chunk).decode('ascii') conversation.append_audio(audio_b64) time.sleep(delay) def main(): setup_logging() init_api_key() audio_file_path = "./your_audio_file.pcm" conversation = OmniRealtimeConversation( model='qwen3-asr-flash-realtime', # URL berikut untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime. url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime', callback=MyCallback(conversation=None) # Sementara berikan None dan injeksikan nanti. ) # Injeksikan self ke dalam callback. conversation.callback.conversation = conversation def handle_exit(sig, frame): print('Ctrl+C pressed, exiting...') conversation.close() sys.exit(0) signal.signal(signal.SIGINT, handle_exit) conversation.connect() transcription_params = TranscriptionParams( language='zh', sample_rate=16000, input_audio_format="pcm" ) conversation.update_session( output_modalities=[MultiModality.TEXT], enable_input_audio_transcription=True, transcription_params=transcription_params ) try: send_audio(conversation, audio_file_path) # Kirim session.finish, tunggu sesi selesai, lalu tutup koneksi. conversation.end_session() except Exception as e: print(f"Error occurred: {e}") finally: conversation.close() print("Audio processing completed.") if __name__ == '__main__': main()
Gunakan API WebSocket
Contoh berikut menunjukkan cara mengirim file audio lokal dan mengambil hasil pengenalan melalui koneksi WebSocket.
Dapatkan Kunci API: Dapatkan Kunci API. Untuk keamanan, tetapkan Kunci API sebagai variabel lingkungan.
Tulis dan jalankan kode: Implementasikan alur lengkap autentikasi, koneksi, pengiriman audio, dan penerimaan hasil. Untuk informasi selengkapnya, lihat Alur interaksi.
Python
Sebelum menjalankan contoh, instal dependensi dengan menjalankan perintah berikut:
pip uninstall websocket-client pip uninstall websocket pip install websocket-clientJangan beri nama file kode contoh dengan
websocket.py. Jika tidak, error berikut dapat terjadi: AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?# pip install websocket-client import os import time import json import threading import base64 import websocket import logging import logging.handlers from datetime import datetime logger = logging.getLogger(__name__) logger.setLevel(logging.DEBUG) # Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key. # Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: API_KEY="sk-xxx" API_KEY = os.environ.get("DASHSCOPE_API_KEY", "sk-xxx") QWEN_MODEL = "qwen3-asr-flash-realtime" # Berikut adalah URL dasar untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dasar dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime. baseUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime" url = f"{baseUrl}?model={QWEN_MODEL}" print(f"Connecting to server: {url}") # Catatan: Jika Anda tidak berada dalam mode VAD, durasi kumulatif audio yang dikirim terus-menerus tidak boleh melebihi 60 detik. enableServerVad = True is_running = True # Tambahkan flag running. headers = [ "Authorization: Bearer " + API_KEY, "OpenAI-Beta: realtime=v1" ] def init_logger(): formatter = logging.Formatter('%(asctime)s|%(levelname)s|%(message)s') f_handler = logging.handlers.RotatingFileHandler( "omni_tester.log", maxBytes=100 * 1024 * 1024, backupCount=3 ) f_handler.setLevel(logging.DEBUG) f_handler.setFormatter(formatter) console = logging.StreamHandler() console.setLevel(logging.DEBUG) console.setFormatter(formatter) logger.addHandler(f_handler) logger.addHandler(console) def on_open(ws): logger.info("Connected to server.") # Event pembaruan sesi. event_manual = { "event_id": "event_123", "type": "session.update", "session": { "modalities": ["text"], "input_audio_format": "pcm", "sample_rate": 16000, "input_audio_transcription": { # Pengidentifikasi bahasa, opsional. Jika Anda memiliki informasi bahasa yang jelas, atur di sini. "language": "zh" }, "turn_detection": None } } event_vad = { "event_id": "event_123", "type": "session.update", "session": { "modalities": ["text"], "input_audio_format": "pcm", "sample_rate": 16000, "input_audio_transcription": { "language": "zh" }, "turn_detection": { "type": "server_vad", "threshold": 0.0, "silence_duration_ms": 400 } } } if enableServerVad: logger.info(f"Sending event: {json.dumps(event_vad, indent=2)}") ws.send(json.dumps(event_vad)) else: logger.info(f"Sending event: {json.dumps(event_manual, indent=2)}") ws.send(json.dumps(event_manual)) def on_message(ws, message): global is_running try: data = json.loads(message) logger.info(f"Received event: {json.dumps(data, ensure_ascii=False, indent=2)}") if data.get("type") == "session.finished": logger.info(f"Final transcript: {data.get('transcript')}") logger.info("Closing WebSocket connection after session finished...") is_running = False # Hentikan thread pengiriman audio. ws.close() except json.JSONDecodeError: logger.error(f"Failed to parse message: {message}") def on_error(ws, error): logger.error(f"Error: {error}") def on_close(ws, close_status_code, close_msg): logger.info(f"Connection closed: {close_status_code} - {close_msg}") def send_audio(ws, local_audio_path): time.sleep(3) # Tunggu pembaruan sesi selesai. global is_running with open(local_audio_path, 'rb') as audio_file: logger.info(f"Start reading the file: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}") while is_running: audio_data = audio_file.read(3200) # ~0.1 detik audio PCM16/16 kHz. if not audio_data: logger.info(f"Finished reading the file: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}") if ws.sock and ws.sock.connected: if not enableServerVad: commit_event = { "event_id": "event_789", "type": "input_audio_buffer.commit" } ws.send(json.dumps(commit_event)) finish_event = { "event_id": "event_987", "type": "session.finish" } ws.send(json.dumps(finish_event)) break if not ws.sock or not ws.sock.connected: logger.info("The WebSocket is closed. Stop sending audio.") break encoded_data = base64.b64encode(audio_data).decode('utf-8') eventd = { "event_id": f"event_{int(time.time() * 1000)}", "type": "input_audio_buffer.append", "audio": encoded_data } ws.send(json.dumps(eventd)) logger.info(f"Sending audio event: {eventd['event_id']}") time.sleep(0.1) # Simulasikan pengumpulan real-time. # Inisialisasi logger. init_logger() logger.info(f"Connecting to WebSocket server at {url}...") local_audio_path = "your_audio_file.pcm" ws = websocket.WebSocketApp( url, header=headers, on_open=on_open, on_message=on_message, on_error=on_error, on_close=on_close ) thread = threading.Thread(target=send_audio, args=(ws, local_audio_path)) thread.start() ws.run_forever()Java
Sebelum menjalankan contoh, instal dependensi Java-WebSocket:
Maven
<dependency> <groupId>org.java-websocket</groupId> <artifactId>Java-WebSocket</artifactId> <version>1.5.6</version> </dependency>Gradle
implementation 'org.java-websocket:Java-WebSocket:1.5.6'import org.java_websocket.client.WebSocketClient; import org.java_websocket.handshake.ServerHandshake; import org.json.JSONObject; import java.net.URI; import java.nio.file.Files; import java.nio.file.Paths; import java.util.Base64; import java.util.concurrent.atomic.AtomicBoolean; import java.util.logging.*; public class QwenASRRealtimeClient { private static final Logger logger = Logger.getLogger(QwenASRRealtimeClient.class.getName()); // Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key. // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: private static final String API_KEY = "sk-xxx" private static final String API_KEY = System.getenv().getOrDefault("DASHSCOPE_API_KEY", "sk-xxx"); private static final String MODEL = "qwen3-asr-flash-realtime"; // Mengontrol apakah akan menggunakan mode VAD. private static final boolean enableServerVad = true; private static final AtomicBoolean isRunning = new AtomicBoolean(true); private static WebSocketClient client; public static void main(String[] args) throws Exception { initLogger(); // Berikut adalah URL dasar untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dasar dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime. String baseUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"; String url = baseUrl + "?model=" + MODEL; logger.info("Connecting to server: " + url); client = new WebSocketClient(new URI(url)) { @Override public void onOpen(ServerHandshake handshake) { logger.info("Connected to server."); sendSessionUpdate(); } @Override public void onMessage(String message) { try { JSONObject data = new JSONObject(message); String eventType = data.optString("type"); logger.info("Received event: " + data.toString(2)); // Saat event selesai diterima, hentikan thread pengiriman dan tutup koneksi. if ("session.finished".equals(eventType)) { logger.info("Final transcript: " + data.optString("transcript")); logger.info("Closing WebSocket connection after session finished..."); isRunning.set(false); // Hentikan thread pengiriman audio. if (this.isOpen()) { this.close(1000, "ASR finished"); } } } catch (Exception e) { logger.severe("Failed to parse message: " + message); } } @Override public void onClose(int code, String reason, boolean remote) { logger.info("Connection closed: " + code + " - " + reason); } @Override public void onError(Exception ex) { logger.severe("Error: " + ex.getMessage()); } }; // Tambahkan header permintaan. client.addHeader("Authorization", "Bearer " + API_KEY); client.addHeader("OpenAI-Beta", "realtime=v1"); client.connectBlocking(); // Blokir hingga koneksi terbentuk. // Ganti dengan path file audio yang akan dikenali. String localAudioPath = "your_audio_file.pcm"; Thread audioThread = new Thread(() -> { try { sendAudio(localAudioPath); } catch (Exception e) { logger.severe("Audio sending thread error: " + e.getMessage()); } }); audioThread.start(); } /** Event pembaruan sesi (aktifkan/nonaktifkan VAD). */ private static void sendSessionUpdate() { JSONObject eventNoVad = new JSONObject() .put("event_id", "event_123") .put("type", "session.update") .put("session", new JSONObject() .put("modalities", new String[]{"text"}) .put("input_audio_format", "pcm") .put("sample_rate", 16000) .put("input_audio_transcription", new JSONObject() .put("language", "zh")) .put("turn_detection", JSONObject.NULL) // Mode manual. ); JSONObject eventVad = new JSONObject() .put("event_id", "event_123") .put("type", "session.update") .put("session", new JSONObject() .put("modalities", new String[]{"text"}) .put("input_audio_format", "pcm") .put("sample_rate", 16000) .put("input_audio_transcription", new JSONObject() .put("language", "zh")) .put("turn_detection", new JSONObject() .put("type", "server_vad") .put("threshold", 0.0) .put("silence_duration_ms", 400)) ); if (enableServerVad) { logger.info("Sending event (VAD):\n" + eventVad.toString(2)); client.send(eventVad.toString()); } else { logger.info("Sending event (Manual):\n" + eventNoVad.toString(2)); client.send(eventNoVad.toString()); } } /** Kirim aliran file audio. */ private static void sendAudio(String localAudioPath) throws Exception { Thread.sleep(3000); // Tunggu sesi siap. byte[] allBytes = Files.readAllBytes(Paths.get(localAudioPath)); logger.info("Start reading the file."); int offset = 0; while (isRunning.get() && offset < allBytes.length) { int chunkSize = Math.min(3200, allBytes.length - offset); byte[] chunk = new byte[chunkSize]; System.arraycopy(allBytes, offset, chunk, 0, chunkSize); offset += chunkSize; if (client != null && client.isOpen()) { String encoded = Base64.getEncoder().encodeToString(chunk); JSONObject eventd = new JSONObject() .put("event_id", "event_" + System.currentTimeMillis()) .put("type", "input_audio_buffer.append") .put("audio", encoded); client.send(eventd.toString()); logger.info("Sending audio event: " + eventd.getString("event_id")); } else { break; // Hindari pengiriman setelah putus. } Thread.sleep(100); // Simulasikan pengiriman real-time. } logger.info("Finished reading the file."); if (client != null && client.isOpen()) { // Commit diperlukan dalam mode non-VAD. if (!enableServerVad) { JSONObject commitEvent = new JSONObject() .put("event_id", "event_789") .put("type", "input_audio_buffer.commit"); client.send(commitEvent.toString()); logger.info("Sent commit event for manual mode."); } JSONObject finishEvent = new JSONObject() .put("event_id", "event_987") .put("type", "session.finish"); client.send(finishEvent.toString()); logger.info("Sent finish event."); } } /** Inisialisasi logger. */ private static void initLogger() { logger.setLevel(Level.ALL); Logger rootLogger = Logger.getLogger(""); for (Handler h : rootLogger.getHandlers()) { rootLogger.removeHandler(h); } Handler consoleHandler = new ConsoleHandler(); consoleHandler.setLevel(Level.ALL); consoleHandler.setFormatter(new SimpleFormatter()); logger.addHandler(consoleHandler); } }Node.js
Sebelum menjalankan contoh, instal dependensi dengan menjalankan perintah berikut:
npm install ws/** * Qwen-ASR Realtime WebSocket Client (versi Node.js) * Fitur: * - Mendukung mode VAD dan Manual. * - Mengirim session.update untuk memulai sesi. * - Terus-menerus mengirim potongan audio input_audio_buffer.append. * - Mengirim input_audio_buffer.commit dalam mode Manual. * - Mengirim event session.finish. * - Menutup koneksi setelah menerima event session.finished. */ import WebSocket from 'ws'; import fs from 'fs'; // ===== Konfigurasi ===== // Kunci API untuk Wilayah Singapura dan Beijing berbeda. Untuk mendapatkan Kunci API, lihat https://www.alibabacloud.com/help/zh/model-studio/get-api-key. // Jika Anda belum mengonfigurasi variabel lingkungan, ganti baris berikut dengan Kunci API Model Studio Anda: const API_KEY = "sk-xxx" const API_KEY = process.env.DASHSCOPE_API_KEY || 'sk-xxx'; const MODEL = 'qwen3-asr-flash-realtime'; const enableServerVad = true; // true untuk mode VAD, false untuk mode Manual const localAudioPath = 'your_audio_file.pcm'; // Path ke file audio PCM16, 16 kHz // Berikut adalah URL dasar untuk Wilayah Singapura. Jika Anda menggunakan model di Wilayah Beijing, ganti URL dasar dengan wss://dashscope.aliyuncs.com/api-ws/v1/realtime. const baseUrl = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'; const url = `${baseUrl}?model=${MODEL}`; console.log(`Connecting to server: ${url}`); // ===== Kontrol Status ===== let isRunning = true; // ===== Buat Koneksi ===== const ws = new WebSocket(url, { headers: { 'Authorization': `Bearer ${API_KEY}`, 'OpenAI-Beta': 'realtime=v1' } }); // ===== Binding Event ===== ws.on('open', () => { console.log('[WebSocket] Connected to server.'); sendSessionUpdate(); // Mulai thread pengiriman audio. sendAudio(localAudioPath); }); ws.on('message', (message) => { try { const data = JSON.parse(message); console.log('[Received Event]:', JSON.stringify(data, null, 2)); // Menerima event selesai. if (data.type === 'session.finished') { console.log(`[Final Transcript] ${data.transcript}`); console.log('[Action] Closing WebSocket connection after session finished...'); if (ws.readyState === WebSocket.OPEN) { ws.close(1000, 'ASR finished'); } } } catch (e) { console.error('[Error] Failed to parse message:', message); } }); ws.on('close', (code, reason) => { console.log(`[WebSocket] Connection closed: ${code} - ${reason}`); }); ws.on('error', (err) => { console.error('[WebSocket Error]', err); }); // ===== Pembaruan Sesi ===== function sendSessionUpdate() { const eventNoVad = { event_id: 'event_123', type: 'session.update', session: { modalities: ['text'], input_audio_format: 'pcm', sample_rate: 16000, input_audio_transcription: { language: 'zh' }, turn_detection: null } }; const eventVad = { event_id: 'event_123', type: 'session.update', session: { modalities: ['text'], input_audio_format: 'pcm', sample_rate: 16000, input_audio_transcription: { language: 'zh' }, turn_detection: { type: 'server_vad', threshold: 0.0, silence_duration_ms: 400 } } }; if (enableServerVad) { console.log('[Send Event] VAD Mode:\n', JSON.stringify(eventVad, null, 2)); ws.send(JSON.stringify(eventVad)); } else { console.log('[Send Event] Manual Mode:\n', JSON.stringify(eventNoVad, null, 2)); ws.send(JSON.stringify(eventNoVad)); } } // ===== Kirim Aliran File Audio ===== function sendAudio(audioPath) { setTimeout(() => { console.log(`[File Read Start] ${audioPath}`); const buffer = fs.readFileSync(audioPath); let offset = 0; const chunkSize = 3200; // Sekitar 0.1 detik audio PCM16 function sendChunk() { if (!isRunning) return; if (offset >= buffer.length) { isRunning = false; // Hentikan pengiriman audio. console.log('[File Read End]'); if (ws.readyState === WebSocket.OPEN) { if (!enableServerVad) { const commitEvent = { event_id: 'event_789', type: 'input_audio_buffer.commit' }; ws.send(JSON.stringify(commitEvent)); console.log('[Send Commit Event]'); } const finishEvent = { event_id: 'event_987', type: 'session.finish' }; ws.send(JSON.stringify(finishEvent)); console.log('[Send Finish Event]'); } return; } if (ws.readyState !== WebSocket.OPEN) { console.log('[Stop] WebSocket is not open.'); return; } const chunk = buffer.slice(offset, offset + chunkSize); offset += chunkSize; const encoded = chunk.toString('base64'); const appendEvent = { event_id: `event_${Date.now()}`, type: 'input_audio_buffer.append', audio: encoded }; ws.send(JSON.stringify(appendEvent)); console.log(`[Send Audio Event] ${appendEvent.event_id}`); setTimeout(sendChunk, 100); // Simulasikan pengiriman real-time. } sendChunk(); }, 3000); // Tunggu konfigurasi sesi selesai. }
Referensi API
Fitur model
Fitur | qwen3-asr-flash-realtime, qwen3-asr-flash-realtime-2026-02-10, qwen3-asr-flash-realtime-2025-10-27 |
Bahasa yang didukung | Bahasa Tionghoa (Mandarin, Sichuan, Minnan, Wu, dan Kanton), Inggris, Jepang, Jerman, Korea, Rusia, Prancis, Portugis, Arab, Italia, Spanyol, Hindi, Indonesia, Thailand, Turki, Ukraina, Vietnam, Ceko, Denmark, Filipina, Finlandia, Islandia, Melayu, Norwegia, Polandia, dan Swedia |
Format audio yang didukung | pcm, opus |
Sample rate | 8 kHz, 16 kHz |
Channel | Mono |
Format input | Aliran audio biner |
Ukuran/durasi audio | Tanpa Batas |
Pengenalan emosi | Selalu aktif |
Penyaringan kata sensitif | |
Speaker diarization | |
Penyaringan kata pengisi | |
Timestamp | |
Prediksi tanda baca | Selalu aktif |
Inverse Text Normalization (ITN) | |
Voice Activity Detection (VAD) | Selalu aktif |
Batas laju (RPS) | 20 |
Jenis koneksi | SDK Java/Python, API WebSocket |
Harga | Internasional: $0,00009/detik Mainland China: $0,000047/detik |