全部產品
Search
文件中心

Alibaba Cloud Model Studio:即時語音辨識-千問

更新時間:May 09, 2026

在直播、線上會議、語音交談或智能助手等情境中,需要將連續的音頻流即時轉化為文字,以提供即時字幕、產生會議記錄或響應語音指令。千問即時語音辨識服務通過 WebSocket 接收音頻流並即時轉寫,以低延遲返回中間結果和最終結果。

核心功能

  • 多語種高精度識別:支援多語言高精度語音辨識(涵蓋普通話及多種方言,如粵語、四川話等,詳情請參見模型功能特性

  • 複雜環境適應:具備應對複雜聲學環境的能力,支援自動語種檢測與智能非人聲過濾

  • 情感識別:支援多種情緒狀態的識別(涵蓋驚訝、平靜、愉快、悲傷、厭惡、憤怒和恐懼)

適用範圍

支援的模型:

中國內地

服務部署範圍為中國內地時,模型推理計算資源僅限於中國內地;待用資料儲存於您所選的地區。該部署範圍支援的地區:華北2(北京)。

調用以下模型時,請選擇北京地區的API Key

千問3-ASR-Flash-Realtime:qwen3-asr-flash-realtime(穩定版,當前等同qwen3-asr-flash-realtime-2025-10-27)、qwen3-asr-flash-realtime-2026-02-10(最新快照版)、qwen3-asr-flash-realtime-2025-10-27(快照版)

國際

服務部署範圍為國際時,模型推理計算資源在全球範圍內動態調度(不含中國內地);待用資料儲存於您所選的地區。該部署範圍支援的地區:新加坡。

調用以下模型時,請選擇新加坡地區的API Key

千問3-ASR-Flash-Realtime:qwen3-asr-flash-realtime(穩定版,當前等同qwen3-asr-flash-realtime-2025-10-27)、qwen3-asr-flash-realtime-2026-02-10(最新快照版)、qwen3-asr-flash-realtime-2025-10-27(快照版)

國際

服務部署範圍為國際時,模型推理計算資源在全球範圍內動態調度(不含中國內地);待用資料儲存於您所選的地區。該部署範圍支援的地區:新加坡。

調用以下模型時,請選擇新加坡地區的API Key

千問3-ASR-Flash-Realtime:qwen3-asr-flash-realtime(穩定版,當前等同qwen3-asr-flash-realtime-2025-10-27)、qwen3-asr-flash-realtime-2026-02-10(最新快照版)、qwen3-asr-flash-realtime-2025-10-27(快照版)

中國內地

服務部署範圍為中國內地時,模型推理計算資源僅限於中國內地;待用資料儲存於您所選的地區。該部署範圍支援的地區:華北2(北京)。

調用以下模型時,請選擇北京地區的API Key

千問3-ASR-Flash-Realtime:qwen3-asr-flash-realtime(穩定版,當前等同qwen3-asr-flash-realtime-2025-10-27)、qwen3-asr-flash-realtime-2026-02-10(最新快照版)、qwen3-asr-flash-realtime-2025-10-27(快照版)

模型選型

情境

推薦模型

理由

智能客服質檢

qwen3-asr-flash-realtime-2026-02-10

即時分析通話內容與客戶情緒,輔助坐席並進行服務品質監控

直播/短視頻

為直播內容產生即時字幕,覆蓋多語種觀眾

線上會議/訪談

即時記錄會議發言,快速產生文字紀要,提高資訊整理效率

更多說明請參見模型功能特性

快速開始

使用DashScope SDK

Java

  1. 安裝SDK,確保DashScope SDK版本不低於2.22.5。

  2. 擷取API Key,推薦使用環境變數配置API Key,以避免在代碼中寫入程式碼。

  3. 運行範例程式碼。

    import com.alibaba.dashscope.audio.omni.*;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.google.gson.JsonObject;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import javax.sound.sampled.LineUnavailableException;
    import java.io.File;
    import java.io.FileInputStream;
    import java.util.Base64;
    import java.util.Collections;
    import java.util.concurrent.CountDownLatch;
    import java.util.concurrent.atomic.AtomicReference;
    
    public class Qwen3AsrRealtimeUsage {
        private static final Logger log = LoggerFactory.getLogger(Qwen3AsrRealtimeUsage.class);
        private static final int AUDIO_CHUNK_SIZE = 1024; // Audio chunk size in bytes
        private static final int SLEEP_INTERVAL_MS = 30;  // Sleep interval in milliseconds
    
        public static void main(String[] args) throws InterruptedException, LineUnavailableException {
            CountDownLatch finishLatch = new CountDownLatch(1);
    
            OmniRealtimeParam param = OmniRealtimeParam.builder()
                    .model("qwen3-asr-flash-realtime")
                    // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                    // 若沒有配置環境變數,請用百鍊API Key將下行替換為:.apikey("sk-xxx")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    .build();
    
            OmniRealtimeConversation conversation = null;
            final AtomicReference<OmniRealtimeConversation> conversationRef = new AtomicReference<>(null);
            conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
                @Override
                public void onOpen() {
                    System.out.println("connection opened");
                }
                @Override
                public void onEvent(JsonObject message) {
                    String type = message.get("type").getAsString();
                    switch(type) {
                        case "session.created":
                            System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
                            break;
                        case "conversation.item.input_audio_transcription.completed":
                            System.out.println("transcription: " + message.get("transcript").getAsString());
                            finishLatch.countDown();
                            break;
                        case "input_audio_buffer.speech_started":
                            System.out.println("======VAD Speech Start======");
                            break;
                        case "input_audio_buffer.speech_stopped":
                            System.out.println("======VAD Speech Stop======");
                            break;
                        case "conversation.item.input_audio_transcription.text":
                            System.out.println("transcription: " + message.get("text").getAsString() + message.get("stash").getAsString());
                            break;
                        default:
                            break;
                    }
                }
                @Override
                public void onClose(int code, String reason) {
                    System.out.println("connection closed code: " + code + ", reason: " + reason);
                }
            });
            conversationRef.set(conversation);
            try {
                conversation.connect();
            } catch (NoApiKeyException e) {
                throw new RuntimeException(e);
            }
    
            OmniRealtimeTranscriptionParam transcriptionParam = new OmniRealtimeTranscriptionParam();
            transcriptionParam.setLanguage("zh");
            transcriptionParam.setInputAudioFormat("pcm");
            transcriptionParam.setInputSampleRate(16000);
    
            OmniRealtimeConfig config = OmniRealtimeConfig.builder()
                    .modalities(Collections.singletonList(OmniRealtimeModality.TEXT))
                    .transcriptionConfig(transcriptionParam)
                    .build();
            conversation.updateSession(config);
    
            String filePath = "your_audio_file.pcm";
            File audioFile = new File(filePath);
            if (!audioFile.exists()) {
                log.error("Audio file not found: {}", filePath);
                return;
            }
    
            try (FileInputStream audioInputStream = new FileInputStream(audioFile)) {
                byte[] audioBuffer = new byte[AUDIO_CHUNK_SIZE];
                int bytesRead;
                int totalBytesRead = 0;
    
                log.info("Starting to send audio data from: {}", filePath);
    
                // Read and send audio data in chunks
                while ((bytesRead = audioInputStream.read(audioBuffer)) != -1) {
                    totalBytesRead += bytesRead;
                    String audioB64 = Base64.getEncoder().encodeToString(audioBuffer);
                    // Send audio chunk to conversation
                    conversation.appendAudio(audioB64);
    
                    // Add small delay to simulate real-time audio streaming
                    Thread.sleep(SLEEP_INTERVAL_MS);
                }
    
                log.info("Finished sending audio data. Total bytes sent: {}", totalBytesRead);
    
            } catch (Exception e) {
                log.error("Error sending audio from file: {}", filePath, e);
            }
    
            //send session.finish and wait for finish and close
            conversation.endSession();
            log.info("task finished");
    
            System.exit(0);
        }
    }

Python

  1. 安裝SDK,確保DashScope SDK版本不低於1.25.6。

  2. 擷取API Key,推薦使用環境變數配置API Key,以避免在代碼中寫入程式碼。

  3. 運行範例程式碼。

    import logging
    import os
    import base64
    import signal
    import sys
    import time
    import dashscope
    from dashscope.audio.qwen_omni import *
    from dashscope.audio.qwen_omni.omni_realtime import TranscriptionParams
    
    def setup_logging():
        """配置日誌輸出"""
        logger = logging.getLogger('dashscope')
        logger.setLevel(logging.DEBUG)
        handler = logging.StreamHandler(sys.stdout)
        handler.setLevel(logging.DEBUG)
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        logger.propagate = False
        return logger
    
    def init_api_key():
        """初始化 API Key"""
        # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # 若沒有配置環境變數,請用百鍊API Key將下行替換為:dashscope.api_key = "sk-xxx"
        dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY', 'YOUR_API_KEY')
        if dashscope.api_key == 'YOUR_API_KEY':
            print('[Warning] Using placeholder API key, set DASHSCOPE_API_KEY environment variable.')
    
    class MyCallback(OmniRealtimeCallback):
        """即時識別回調處理"""
        def __init__(self, conversation):
            self.conversation = conversation
            self.handlers = {
                'session.created': self._handle_session_created,
                'conversation.item.input_audio_transcription.completed': self._handle_final_text,
                'conversation.item.input_audio_transcription.text': self._handle_transcription_text,
                'input_audio_buffer.speech_started': lambda r: print('======Speech Start======'),
                'input_audio_buffer.speech_stopped': lambda r: print('======Speech Stop======')
            }
    
        def on_open(self):
            print('Connection opened')
    
        def on_close(self, code, msg):
            print(f'Connection closed, code: {code}, msg: {msg}')
    
        def on_event(self, response):
            try:
                handler = self.handlers.get(response['type'])
                if handler:
                    handler(response)
            except Exception as e:
                print(f'[Error] {e}')
    
        def _handle_session_created(self, response):
            print(f"Start session: {response['session']['id']}")
    
        def _handle_final_text(self, response):
            print(f"Final recognized text: {response['transcript']}")
    
        def _handle_transcription_text(self, response):
            print(f"Got transcription result: {response['text'] + response['stash']}")
    
    def read_audio_chunks(file_path, chunk_size=3200):
        """按塊讀取音頻檔案"""
        with open(file_path, 'rb') as f:
            while chunk := f.read(chunk_size):
                yield chunk
    
    def send_audio(conversation, file_path, delay=0.1):
        """發送音頻資料"""
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"Audio file {file_path} does not exist.")
    
        print("Processing audio file... Press 'Ctrl+C' to stop.")
        for chunk in read_audio_chunks(file_path):
            audio_b64 = base64.b64encode(chunk).decode('ascii')
            conversation.append_audio(audio_b64)
            time.sleep(delay)
    
    def main():
        setup_logging()
        init_api_key()
    
        audio_file_path = "./your_audio_file.pcm"
        conversation = OmniRealtimeConversation(
            model='qwen3-asr-flash-realtime',
            # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime',
            callback=MyCallback(conversation=None)  # 暫時傳None,稍後注入
        )
    
        # 注入自身到回調
        conversation.callback.conversation = conversation
    
        def handle_exit(sig, frame):
            print('Ctrl+C pressed, exiting...')
            conversation.close()
            sys.exit(0)
    
        signal.signal(signal.SIGINT, handle_exit)
    
        conversation.connect()
    
        transcription_params = TranscriptionParams(
            language='zh',
            sample_rate=16000,
            input_audio_format="pcm"
        )
    
        conversation.update_session(
            output_modalities=[MultiModality.TEXT],
            enable_input_audio_transcription=True,
            transcription_params=transcription_params
        )
    
        try:
            send_audio(conversation, audio_file_path)
            # send session.finish and wait for finished and close
            conversation.end_session()
        except Exception as e:
            print(f"Error occurred: {e}")
        finally:
            conversation.close()
            print("Audio processing completed.")
    
    if __name__ == '__main__':
        main()

使用WebSocket API

以下樣本示範如何通過 WebSocket 串連發送本地音頻檔案並擷取識別結果。程式碼完成API鑒權、建立 WebSocket 會話、分塊發送音頻資料,並輸出中間轉寫結果和最終轉寫結果。

  1. 擷取API Key:擷取API Key,安全起見,推薦將API Key配置到環境變數。

  2. 編寫並運行代碼:以下樣本完成API鑒權、建立 WebSocket 會話、分塊發送音頻資料並接收轉寫結果的完整流程(詳情請參見互動流程)。

    Python

    在運行樣本前,請確保已使用以下命令安裝依賴:

    pip uninstall websocket-client
    pip uninstall websocket
    pip install websocket-client

    請不要將範例程式碼檔案命名為 websocket.py,這會與 websocket 庫產生命名衝突,導致如下錯誤:AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?

    # pip install websocket-client
    import os
    import time
    import json
    import threading
    import base64
    import websocket
    import logging
    import logging.handlers
    from datetime import datetime
    
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # 若沒有配置環境變數,請用百鍊API Key將下行替換為:API_KEY="sk-xxx"
    API_KEY = os.environ.get("DASHSCOPE_API_KEY", "sk-xxx")
    QWEN_MODEL = "qwen3-asr-flash-realtime"
    # 以下為新加坡地區baseUrl,若使用北京地區的模型,需將baseUrl替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
    baseUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
    url = f"{baseUrl}?model={QWEN_MODEL}"
    print(f"Connecting to server: {url}")
    
    # 注意: 如果是非vad模式,建議持續發送的音頻時間長度累加不超過60s
    enableServerVad = True
    is_running = True  # 增加運行標誌位
    
    headers = [
        "Authorization: Bearer " + API_KEY,
        "OpenAI-Beta: realtime=v1"
    ]
    
    def init_logger():
        formatter = logging.Formatter('%(asctime)s|%(levelname)s|%(message)s')
        f_handler = logging.handlers.RotatingFileHandler(
            "omni_tester.log", maxBytes=100 * 1024 * 1024, backupCount=3
        )
        f_handler.setLevel(logging.DEBUG)
        f_handler.setFormatter(formatter)
    
        console = logging.StreamHandler()
        console.setLevel(logging.DEBUG)
        console.setFormatter(formatter)
    
        logger.addHandler(f_handler)
        logger.addHandler(console)
    
    def on_open(ws):
        logger.info("Connected to server.")
    
        # 會話更新事件
        event_manual = {
            "event_id": "event_123",
            "type": "session.update",
            "session": {
                "modalities": ["text"],
                "input_audio_format": "pcm",
                "sample_rate": 16000,
                "input_audio_transcription": {
                    # 語種標識,可選,如果有明確的語種資訊,建議設定
                    "language": "zh"
                },
                "turn_detection": None
            }
        }
        event_vad = {
            "event_id": "event_123",
            "type": "session.update",
            "session": {
                "modalities": ["text"],
                "input_audio_format": "pcm",
                "sample_rate": 16000,
                "input_audio_transcription": {
                    "language": "zh"
                },
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.0,
                    "silence_duration_ms": 400
                }
            }
        }
        if enableServerVad:
            logger.info(f"Sending event: {json.dumps(event_vad, indent=2)}")
            ws.send(json.dumps(event_vad))
        else:
            logger.info(f"Sending event: {json.dumps(event_manual, indent=2)}")
            ws.send(json.dumps(event_manual))
    
    def on_message(ws, message):
        global is_running
        try:
            data = json.loads(message)
            logger.info(f"Received event: {json.dumps(data, ensure_ascii=False, indent=2)}")
            if data.get("type") == "session.finished":
                logger.info(f"Final transcript: {data.get('transcript')}")
                logger.info("Closing WebSocket connection after session finished...")
                is_running = False  # 停止音頻發送線程
                ws.close()
        except json.JSONDecodeError:
            logger.error(f"Failed to parse message: {message}")
    
    def on_error(ws, error):
        logger.error(f"Error: {error}")
    
    def on_close(ws, close_status_code, close_msg):
        logger.info(f"Connection closed: {close_status_code} - {close_msg}")
    
    def send_audio(ws, local_audio_path):
        time.sleep(3)  # 等待會話更新完成
        global is_running
    
        with open(local_audio_path, 'rb') as audio_file:
            logger.info(f"檔案讀取開始: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
            while is_running:
                audio_data = audio_file.read(3200)  # ~0.1s PCM16/16kHz
                if not audio_data:
                    logger.info(f"檔案讀取完畢: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}")
                    if ws.sock and ws.sock.connected:
                        if not enableServerVad:
                            commit_event = {
                                "event_id": "event_789",
                                "type": "input_audio_buffer.commit"
                            }
                            ws.send(json.dumps(commit_event))
                        finish_event = {
                            "event_id": "event_987",
                            "type": "session.finish"
                        }
                        ws.send(json.dumps(finish_event))
                    break
    
                if not ws.sock or not ws.sock.connected:
                    logger.info("WebSocket已關閉,停止發送音頻。")
                    break
    
                encoded_data = base64.b64encode(audio_data).decode('utf-8')
                eventd = {
                    "event_id": f"event_{int(time.time() * 1000)}",
                    "type": "input_audio_buffer.append",
                    "audio": encoded_data
                }
                ws.send(json.dumps(eventd))
                logger.info(f"Sending audio event: {eventd['event_id']}")
                time.sleep(0.1)  # 類比即時採集
    
    # 初始化日誌
    init_logger()
    logger.info(f"Connecting to WebSocket server at {url}...")
    
    local_audio_path = "your_audio_file.pcm"
    ws = websocket.WebSocketApp(
        url,
        header=headers,
        on_open=on_open,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close
    )
    
    thread = threading.Thread(target=send_audio, args=(ws, local_audio_path))
    thread.start()
    ws.run_forever()

    Java

    在運行樣本前,請確保已安裝Java-WebSocket依賴:

    Maven

    <dependency>
        <groupId>org.java-websocket</groupId>
        <artifactId>Java-WebSocket</artifactId>
        <version>1.5.6</version>
    </dependency>

    Gradle

    implementation 'org.java-websocket:Java-WebSocket:1.5.6'
    import org.java_websocket.client.WebSocketClient;
    import org.java_websocket.handshake.ServerHandshake;
    import org.json.JSONObject;
    
    import java.net.URI;
    import java.nio.file.Files;
    import java.nio.file.Paths;
    import java.util.Base64;
    import java.util.concurrent.atomic.AtomicBoolean;
    import java.util.logging.*;
    
    public class QwenASRRealtimeClient {
    
        private static final Logger logger = Logger.getLogger(QwenASRRealtimeClient.class.getName());
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // 若沒有配置環境變數,請用百鍊API Key將下行替換為:private static final String API_KEY = "sk-xxx"
        private static final String API_KEY = System.getenv().getOrDefault("DASHSCOPE_API_KEY", "sk-xxx");
        private static final String MODEL = "qwen3-asr-flash-realtime";
    
        // 控制是否使用 VAD 模式
        private static final boolean enableServerVad = true;
    
        private static final AtomicBoolean isRunning = new AtomicBoolean(true);
        private static WebSocketClient client;
    
        public static void main(String[] args) throws Exception {
            initLogger();
    
            // 以下為新加坡地區baseUrl,若使用北京地區的模型,需將baseUrl替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            String baseUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime";
            String url = baseUrl + "?model=" + MODEL;
            logger.info("Connecting to server: " + url);
    
            client = new WebSocketClient(new URI(url)) {
                @Override
                public void onOpen(ServerHandshake handshake) {
                    logger.info("Connected to server.");
                    sendSessionUpdate();
                }
    
                @Override
                public void onMessage(String message) {
                    try {
                        JSONObject data = new JSONObject(message);
                        String eventType = data.optString("type");
    
                        logger.info("Received event: " + data.toString(2));
    
                        // 收到結束事件 → 停止發送線程並關閉串連
                        if ("session.finished".equals(eventType)) {
                            logger.info("Final transcript: " + data.optString("transcript"));
                            logger.info("Closing WebSocket connection after session finished...");
    
                            isRunning.set(false); // 停止發送音頻線程
                            if (this.isOpen()) {
                                this.close(1000, "ASR finished");
                            }
                        }
                    } catch (Exception e) {
                        logger.severe("Failed to parse message: " + message);
                    }
                }
    
                @Override
                public void onClose(int code, String reason, boolean remote) {
                    logger.info("Connection closed: " + code + " - " + reason);
                }
    
                @Override
                public void onError(Exception ex) {
                    logger.severe("Error: " + ex.getMessage());
                }
            };
    
            // 添加要求標頭
            client.addHeader("Authorization", "Bearer " + API_KEY);
            client.addHeader("OpenAI-Beta", "realtime=v1");
    
            client.connectBlocking(); // 阻塞直到串連建立
    
            // 替換為待識別的音頻檔案路徑
            String localAudioPath = "your_audio_file.pcm";
            Thread audioThread = new Thread(() -> {
                try {
                    sendAudio(localAudioPath);
                } catch (Exception e) {
                    logger.severe("Audio sending thread error: " + e.getMessage());
                }
            });
            audioThread.start();
        }
    
        /** 會話更新事件(開啟/關閉 VAD) */
        private static void sendSessionUpdate() {
            JSONObject eventNoVad = new JSONObject()
                    .put("event_id", "event_123")
                    .put("type", "session.update")
                    .put("session", new JSONObject()
                            .put("modalities", new String[]{"text"})
                            .put("input_audio_format", "pcm")
                            .put("sample_rate", 16000)
                            .put("input_audio_transcription", new JSONObject()
                                    .put("language", "zh"))
                            .put("turn_detection", JSONObject.NULL) // 手動模式
                    );
    
            JSONObject eventVad = new JSONObject()
                    .put("event_id", "event_123")
                    .put("type", "session.update")
                    .put("session", new JSONObject()
                            .put("modalities", new String[]{"text"})
                            .put("input_audio_format", "pcm")
                            .put("sample_rate", 16000)
                            .put("input_audio_transcription", new JSONObject()
                                    .put("language", "zh"))
                            .put("turn_detection", new JSONObject()
                                    .put("type", "server_vad")
                                    .put("threshold", 0.0)
                                    .put("silence_duration_ms", 400))
                    );
    
            if (enableServerVad) {
                logger.info("Sending event (VAD):\n" + eventVad.toString(2));
                client.send(eventVad.toString());
            } else {
                logger.info("Sending event (Manual):\n" + eventNoVad.toString(2));
                client.send(eventNoVad.toString());
            }
        }
    
        /** 發送音頻檔案流 */
        private static void sendAudio(String localAudioPath) throws Exception {
            Thread.sleep(3000); // 等會話準備
            byte[] allBytes = Files.readAllBytes(Paths.get(localAudioPath));
            logger.info("檔案讀取開始");
    
            int offset = 0;
            while (isRunning.get() && offset < allBytes.length) {
                int chunkSize = Math.min(3200, allBytes.length - offset);
                byte[] chunk = new byte[chunkSize];
                System.arraycopy(allBytes, offset, chunk, 0, chunkSize);
                offset += chunkSize;
    
                if (client != null && client.isOpen()) {
                    String encoded = Base64.getEncoder().encodeToString(chunk);
                    JSONObject eventd = new JSONObject()
                            .put("event_id", "event_" + System.currentTimeMillis())
                            .put("type", "input_audio_buffer.append")
                            .put("audio", encoded);
    
                    client.send(eventd.toString());
                    logger.info("Sending audio event: " + eventd.getString("event_id"));
                } else {
                    break; // 避免在斷開後繼續發送
                }
    
                Thread.sleep(100); // 類比即時發送
            }
    
            logger.info("檔案讀取完畢");
    
            if (client != null && client.isOpen()) {
                // 非 VAD 模式下需要 commit
                if (!enableServerVad) {
                    JSONObject commitEvent = new JSONObject()
                            .put("event_id", "event_789")
                            .put("type", "input_audio_buffer.commit");
                    client.send(commitEvent.toString());
                    logger.info("Sent commit event for manual mode.");
                }
    
                JSONObject finishEvent = new JSONObject()
                        .put("event_id", "event_987")
                        .put("type", "session.finish");
                client.send(finishEvent.toString());
                logger.info("Sent finish event.");
            }
        }
    
        /** 初始化日誌 */
        private static void initLogger() {
            logger.setLevel(Level.ALL);
            Logger rootLogger = Logger.getLogger("");
            for (Handler h : rootLogger.getHandlers()) {
                rootLogger.removeHandler(h);
            }
    
            Handler consoleHandler = new ConsoleHandler();
            consoleHandler.setLevel(Level.ALL);
            consoleHandler.setFormatter(new SimpleFormatter());
            logger.addHandler(consoleHandler);
        }
    }

    Node.js

    在運行樣本前,請確保已使用以下命令安裝依賴:

    npm install ws
    /**
     * Qwen-ASR Realtime WebSocket 用戶端(Node.js版)
     * 功能:
     * - 支援 VAD 模式和 Manual 模式
     * - 發送 session.update 啟動會話
     * - 持續發送音頻塊 input_audio_buffer.append
     * - 如果是Manual模式,需要發送 input_audio_buffer.commit
     * - 發送session.finish事件
     * - 收到 session.finished 事件後關閉串連
     */
    
    import WebSocket from 'ws';
    import fs from 'fs';
    
    // ===== 配置 =====
    // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // 若沒有配置環境變數,請用百鍊API Key將下行替換為:const API_KEY = "sk-xxx"
    const API_KEY = process.env.DASHSCOPE_API_KEY || 'sk-xxx';
    const MODEL = 'qwen3-asr-flash-realtime';
    const enableServerVad = true; // true為VAD模式,false為Manual模式
    const localAudioPath = 'your_audio_file.pcm'; // PCM16、16kHz音頻檔案路徑
    
    // 以下為新加坡地區baseUrl,若使用北京地區的模型,需將baseUrl替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
    const baseUrl = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime';
    const url = `${baseUrl}?model=${MODEL}`;
    
    console.log(`Connecting to server: ${url}`);
    
    // ===== 狀態控制 =====
    let isRunning = true;
    
    // ===== 建立串連 =====
    const ws = new WebSocket(url, {
        headers: {
            'Authorization': `Bearer ${API_KEY}`,
            'OpenAI-Beta': 'realtime=v1'
        }
    });
    
    // ===== 事件綁定 =====
    ws.on('open', () => {
        console.log('[WebSocket] Connected to server.');
        sendSessionUpdate();
        // 啟動音頻發送線程
        sendAudio(localAudioPath);
    });
    
    ws.on('message', (message) => {
        try {
            const data = JSON.parse(message);
            console.log('[Received Event]:', JSON.stringify(data, null, 2));
    
            // 收到結束事件
            if (data.type === 'session.finished') {
                console.log(`[Final Transcript] ${data.transcript}`);
                console.log('[Action] Closing WebSocket connection after session finished...');
                
                if (ws.readyState === WebSocket.OPEN) {
                    ws.close(1000, 'ASR finished');
                }
            }
        } catch (e) {
            console.error('[Error] Failed to parse message:', message);
        }
    });
    
    ws.on('close', (code, reason) => {
        console.log(`[WebSocket] Connection closed: ${code} - ${reason}`);
    });
    
    ws.on('error', (err) => {
        console.error('[WebSocket Error]', err);
    });
    
    // ===== 會話更新 =====
    function sendSessionUpdate() {
        const eventNoVad = {
            event_id: 'event_123',
            type: 'session.update',
            session: {
                modalities: ['text'],
                input_audio_format: 'pcm',
                sample_rate: 16000,
                input_audio_transcription: {
                    language: 'zh'
                },
                turn_detection: null
            }
        };
    
        const eventVad = {
            event_id: 'event_123',
            type: 'session.update',
            session: {
                modalities: ['text'],
                input_audio_format: 'pcm',
                sample_rate: 16000,
                input_audio_transcription: {
                    language: 'zh'
                },
                turn_detection: {
                    type: 'server_vad',
                    threshold: 0.0,
                    silence_duration_ms: 400
                }
            }
        };
    
        if (enableServerVad) {
            console.log('[Send Event] VAD Mode:\n', JSON.stringify(eventVad, null, 2));
            ws.send(JSON.stringify(eventVad));
        } else {
            console.log('[Send Event] Manual Mode:\n', JSON.stringify(eventNoVad, null, 2));
            ws.send(JSON.stringify(eventNoVad));
        }
    }
    
    // ===== 發送音頻檔案流 =====
    function sendAudio(audioPath) {
        setTimeout(() => {
            console.log(`[File Read Start] ${audioPath}`);
            const buffer = fs.readFileSync(audioPath);
    
            let offset = 0;
            const chunkSize = 3200; // 約0.1s的PCM16音頻
    
            function sendChunk() {
                if (!isRunning) return;
                if (offset >= buffer.length) {
                    isRunning = false; // 停止發送音頻
                    console.log('[File Read End]');
                    if (ws.readyState === WebSocket.OPEN) {
                        if (!enableServerVad) {
                            const commitEvent = {
                                event_id: 'event_789',
                                type: 'input_audio_buffer.commit'
                            };
                            ws.send(JSON.stringify(commitEvent));
                            console.log('[Send Commit Event]');
                        }
    
                        const finishEvent = {
                            event_id: 'event_987',
                            type: 'session.finish'
                        };
                        ws.send(JSON.stringify(finishEvent));
                        console.log('[Send Finish Event]');
                    }
                    
                    return;
                }
    
                if (ws.readyState !== WebSocket.OPEN) {
                    console.log('[Stop] WebSocket is not open.');
                    return;
                }
    
                const chunk = buffer.slice(offset, offset + chunkSize);
                offset += chunkSize;
    
                const encoded = chunk.toString('base64');
                const appendEvent = {
                    event_id: `event_${Date.now()}`,
                    type: 'input_audio_buffer.append',
                    audio: encoded
                };
    
                ws.send(JSON.stringify(appendEvent));
                console.log(`[Send Audio Event] ${appendEvent.event_id}`);
    
                setTimeout(sendChunk, 100); // 類比即時發送
            }
    
            sendChunk();
        }, 3000); // 等待會話配置完成
    }

API參考

即時語音辨識-千問API參考

模型功能特性

功能/特性

qwen3-asr-flash-realtime、qwen3-asr-flash-realtime-2026-02-10、qwen3-asr-flash-realtime-2025-10-27

支援語言

中文(普通話、四川話、閩南語、吳語、粵語)、英語、日語、德語、韓語、俄語、法語、葡萄牙語、阿拉伯語、意大利語、西班牙語、印地語、印尼語、泰語、土耳其語、烏克蘭語、越南語、捷克語、丹麥語、菲律賓語、芬蘭語、冰島語、馬來語、挪威語、波蘭語、瑞典語

支援的音頻格式

pcm、opus

採樣率

8kHz、16kHz

聲道

單聲道

輸入形式

二進位音頻流

音頻大小/時間長度

不限

情感識別

支援 固定開啟

敏感詞過濾

不支援

說話人分離

不支援

語氣詞過濾

不支援

時間戳記

不支援

標點符號預測

支援 固定開啟

熱詞

不支援

ITN(Inverse Text Normalization,逆文本正則化)

不支援

VAD(Voice Activity Detection,語音活動檢測)

支援 固定開啟

限流(RPS)

20

接入方式

Java/Python SDK、WebSocket API

價格

國際:$0.00009/秒

中國內地:$0.000047/秒