即時語音辨識服務可將音頻流即時轉換為帶標點的文本,實現“邊說邊出文字”的效果。無論是麥克風語音、會議錄音還是本地音頻檔案,都能輕鬆轉錄。服務廣泛應用於會議即時記錄、直播字幕、語音交談、智能客服等情境。
核心功能
支援多語種即時語音辨識,覆蓋中英文及多種方言
支援熱詞定製,可提升特定詞彙的識別準確率
支援時間戳記輸出,產生結構化識別結果
靈活採樣率與多種音頻格式,適配不同錄音環境
可選VAD(Voice Activity Detection),自動過濾靜音片段,提升長音頻處理效率
SDK + WebSocket 接入,低延遲穩定服務
適用範圍
支援的模型:
國際
在國際部署模式下,存取點與資料存放區均位於新加坡地區,模型推理計算資源在全球範圍內動態調度(不含中國內地)。
調用以下模型時,請選擇新加坡地區的API Key:
Fun-ASR:fun-asr-realtime(穩定版,當前等同fun-asr-realtime-2025-11-07)、fun-asr-realtime-2025-11-07(快照版)
中國內地
在中國內地部署模式下,存取點與資料存放區均位於北京地區,模型推理計算資源僅限於中國內地。
調用以下模型時,請選擇北京地區的API Key:
Fun-ASR:fun-asr-realtime(穩定版,當前等同fun-asr-realtime-2025-11-07)、fun-asr-realtime-2025-11-07(快照版)、fun-asr-realtime-2025-09-15(快照版)
Paraformer:paraformer-realtime-v2、paraformer-realtime-v1、paraformer-realtime-8k-v2、paraformer-realtime-8k-v1
更多資訊請參見模型列表
模型選型
情境 | 推薦模型 | 理由 |
中文普通話識別(會議/直播) | fun-asr-realtime、fun-asr-realtime-2025-11-07、paraformer-realtime-v2 | 多格式相容,高採樣率支援,穩定延遲 |
多語種識別(國際會議) | paraformer-realtime-v2 | 覆蓋多語種 |
中文方言識別(客服/政務) | fun-asr-realtime-2025-11-07、paraformer-realtime-v2 | 覆蓋多地方言 |
中英日混合識別(課堂/演講) | fun-asr-realtime、fun-asr-realtime-2025-11-07 | 中英日識別最佳化 |
低頻寬電話錄音轉寫 | paraformer-realtime-8k-v2 | 支援8kHz,預設情感識別 |
熱詞定製情境(品牌名/專有術語) | Paraformer、Fun-ASR最新版本模型 | 熱詞可開關,易於迭代配置 |
更多說明請參見模型功能特性對比。
快速開始
下面是調用API的範例程式碼。更多常用情境的程式碼範例,請參見GitHub。
您需要已擷取API Key並配置API Key到環境變數(準備下線,併入配置 API Key)。如果通過SDK調用,還需要安裝DashScope SDK。
Fun-ASR
識別傳入麥克風的語音
即時語音辨識可以識別麥克風中傳入的語音並輸出識別結果,達到“邊說邊出文字”的效果。
Java
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;
import java.nio.ByteBuffer;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
public static void main(String[] args) throws InterruptedException {
// 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/inference
Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(new RealtimeRecognitionTask());
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}
class RealtimeRecognitionTask implements Runnable {
@Override
public void run() {
RecognitionParam param = RecognitionParam.builder()
.model("fun-asr-realtime")
// 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// 若沒有配置環境變數,請用百鍊API Key將下行替換為:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.format("pcm")
.sampleRate(16000)
.build();
Recognition recognizer = new Recognition();
ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
@Override
public void onEvent(RecognitionResult result) {
if (result.isSentenceEnd()) {
System.out.println("Final Result: " + result.getSentence().getText());
} else {
System.out.println("Intermediate Result: " + result.getSentence().getText());
}
}
@Override
public void onComplete() {
System.out.println("Recognition complete");
}
@Override
public void onError(Exception e) {
System.out.println("RecognitionCallback error: " + e.getMessage());
}
};
try {
recognizer.call(param, callback);
// 建立音頻格式
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
// 根據格式匹配預設錄音裝置
TargetDataLine targetDataLine =
AudioSystem.getTargetDataLine(audioFormat);
targetDataLine.open(audioFormat);
// 開始錄音
targetDataLine.start();
ByteBuffer buffer = ByteBuffer.allocate(1024);
long start = System.currentTimeMillis();
// 錄音50s並進行即時轉寫
while (System.currentTimeMillis() - start < 50000) {
int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
if (read > 0) {
buffer.limit(read);
// 將錄音音頻資料發送給流式識別服務
recognizer.sendAudioFrame(buffer);
buffer = ByteBuffer.allocate(1024);
// 錄音速率有限,防止cpu佔用過高,休眠一小會兒
Thread.sleep(20);
}
}
recognizer.stop();
} catch (Exception e) {
e.printStackTrace();
} finally {
// 任務結束後關閉 Websocket 串連
recognizer.getDuplexApi().close(1000, "bye");
}
System.out.println(
"[Metric] requestId: "
+ recognizer.getLastRequestId()
+ ", first package delay ms: "
+ recognizer.getFirstPackageDelay()
+ ", last package delay ms: "
+ recognizer.getLastPackageDelay());
}
}Python
運行Python樣本前,需要通過pip install pyaudio命令安裝第三方音頻播放與採集套件。
import os
import signal # for keyboard events handling (press "Ctrl+C" to terminate recording)
import sys
import dashscope
import pyaudio
from dashscope.audio.asr import *
mic = None
stream = None
# Set recording parameters
sample_rate = 16000 # sampling rate (Hz)
channels = 1 # mono channel
dtype = 'int16' # data type
format_pcm = 'pcm' # the format of the audio data
block_size = 3200 # number of frames per buffer
# Real-time speech recognition callback
class Callback(RecognitionCallback):
def on_open(self) -> None:
global mic
global stream
print('RecognitionCallback open.')
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True)
def on_close(self) -> None:
global mic
global stream
print('RecognitionCallback close.')
stream.stop_stream()
stream.close()
mic.terminate()
stream = None
mic = None
def on_complete(self) -> None:
print('RecognitionCallback completed.') # recognition completed
def on_error(self, message) -> None:
print('RecognitionCallback task_id: ', message.request_id)
print('RecognitionCallback error: ', message.message)
# Stop and close the audio stream if it is running
if 'stream' in globals() and stream.active:
stream.stop()
stream.close()
# Forcefully exit the program
sys.exit(1)
def on_event(self, result: RecognitionResult) -> None:
sentence = result.get_sentence()
if 'text' in sentence:
print('RecognitionCallback text: ', sentence['text'])
if RecognitionResult.is_sentence_end(sentence):
print(
'RecognitionCallback sentence end, request_id:%s, usage:%s'
% (result.get_request_id(), result.get_usage(sentence)))
def signal_handler(sig, frame):
print('Ctrl+C pressed, stop recognition ...')
# Stop recognition
recognition.stop()
print('Recognition stopped.')
print(
'[Metric] requestId: {}, first package delay ms: {}, last package delay ms: {}'
.format(
recognition.get_last_request_id(),
recognition.get_first_package_delay(),
recognition.get_last_package_delay(),
))
# Forcefully exit the program
sys.exit(0)
# main function
if __name__ == '__main__':
# 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 若沒有配置環境變數,請用百鍊API Key將下行替換為:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'
# Create the recognition callback
callback = Callback()
# Call recognition service by async mode, you can customize the recognition parameters, like model, format,
# sample_rate
recognition = Recognition(
model='fun-asr-realtime',
format=format_pcm,
# 'pcm'、'wav'、'opus'、'speex'、'aac'、'amr', you can check the supported formats in the document
sample_rate=sample_rate,
# support 8000, 16000
semantic_punctuation_enabled=False,
callback=callback)
# Start recognition
recognition.start()
signal.signal(signal.SIGINT, signal_handler)
print("Press 'Ctrl+C' to stop recording and recognition...")
# Create a keyboard listener until "Ctrl+C" is pressed
while True:
if stream:
data = stream.read(3200, exception_on_overflow=False)
recognition.send_audio_frame(data)
else:
break
recognition.stop()識別本地音頻檔案
即時語音辨識可以識別本地音頻檔案並輸出識別結果。對於對話聊天、控制口令、語音輸入法、語音搜尋等較短的准即時語音辨識情境可考慮採用該介面進行語音辨識。
Java
樣本中用到的音頻為:asr_example.wav。
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
class TimeUtils {
private static final DateTimeFormatter formatter =
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
public static String getTimestamp() {
return LocalDateTime.now().format(formatter);
}
}
public class Main {
public static void main(String[] args) throws InterruptedException {
// 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/inference
Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(new RealtimeRecognitionTask(Paths.get(System.getProperty("user.dir"), "asr_example.wav")));
executorService.shutdown();
// wait for all tasks to complete
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}
class RealtimeRecognitionTask implements Runnable {
private Path filepath;
public RealtimeRecognitionTask(Path filepath) {
this.filepath = filepath;
}
@Override
public void run() {
RecognitionParam param = RecognitionParam.builder()
.model("fun-asr-realtime")
// 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// 若沒有配置環境變數,請用百鍊API Key將下行替換為:.apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.format("wav")
.sampleRate(16000)
.build();
Recognition recognizer = new Recognition();
String threadName = Thread.currentThread().getName();
ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
@Override
public void onEvent(RecognitionResult message) {
if (message.isSentenceEnd()) {
System.out.println(TimeUtils.getTimestamp()+" "+
"[process " + threadName + "] Final Result:" + message.getSentence().getText());
} else {
System.out.println(TimeUtils.getTimestamp()+" "+
"[process " + threadName + "] Intermediate Result: " + message.getSentence().getText());
}
}
@Override
public void onComplete() {
System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Recognition complete");
}
@Override
public void onError(Exception e) {
System.out.println(TimeUtils.getTimestamp()+" "+
"[" + threadName + "] RecognitionCallback error: " + e.getMessage());
}
};
try {
recognizer.call(param, callback);
// Please replace the path with your audio file path
System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Input file_path is: " + this.filepath);
// Read file and send audio by chunks
FileInputStream fis = new FileInputStream(this.filepath.toFile());
// chunk size set to 1 seconds for 16KHz sample rate
byte[] buffer = new byte[3200];
int bytesRead;
// Loop to read chunks of the file
while ((bytesRead = fis.read(buffer)) != -1) {
ByteBuffer byteBuffer;
// Handle the last chunk which might be smaller than the buffer size
System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] bytesRead: " + bytesRead);
if (bytesRead < buffer.length) {
byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
} else {
byteBuffer = ByteBuffer.wrap(buffer);
}
recognizer.sendAudioFrame(byteBuffer);
buffer = new byte[3200];
Thread.sleep(100);
}
System.out.println(TimeUtils.getTimestamp()+" "+LocalDateTime.now());
recognizer.stop();
} catch (Exception e) {
e.printStackTrace();
} finally {
// 任務結束後關閉 Websocket 串連
recognizer.getDuplexApi().close(1000, "bye");
}
System.out.println(
"["
+ threadName
+ "][Metric] requestId: "
+ recognizer.getLastRequestId()
+ ", first package delay ms: "
+ recognizer.getFirstPackageDelay()
+ ", last package delay ms: "
+ recognizer.getLastPackageDelay());
}
}Python
樣本中用到的音頻為:asr_example.wav。
import os
import time
import dashscope
from dashscope.audio.asr import *
# 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 若沒有配置環境變數,請用百鍊API Key將下行替換為:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'
from datetime import datetime
def get_timestamp():
now = datetime.now()
formatted_timestamp = now.strftime("[%Y-%m-%d %H:%M:%S.%f]")
return formatted_timestamp
class Callback(RecognitionCallback):
def on_complete(self) -> None:
print(get_timestamp() + ' Recognition completed') # recognition complete
def on_error(self, result: RecognitionResult) -> None:
print('Recognition task_id: ', result.request_id)
print('Recognition error: ', result.message)
exit(0)
def on_event(self, result: RecognitionResult) -> None:
sentence = result.get_sentence()
if 'text' in sentence:
print(get_timestamp() + ' RecognitionCallback text: ', sentence['text'])
if RecognitionResult.is_sentence_end(sentence):
print(get_timestamp() +
'RecognitionCallback sentence end, request_id:%s, usage:%s'
% (result.get_request_id(), result.get_usage(sentence)))
callback = Callback()
recognition = Recognition(model='fun-asr-realtime',
format='wav',
sample_rate=16000,
callback=callback)
recognition.start()
try:
audio_data: bytes = None
f = open("asr_example.wav", 'rb')
if os.path.getsize("asr_example.wav"):
while True:
audio_data = f.read(3200)
if not audio_data:
break
else:
recognition.send_audio_frame(audio_data)
time.sleep(0.1)
else:
raise Exception(
'The supplied file was empty (zero bytes long)')
f.close()
except Exception as e:
raise e
recognition.stop()
print(
'[Metric] requestId: {}, first package delay ms: {}, last package delay ms: {}'
.format(
recognition.get_last_request_id(),
recognition.get_first_package_delay(),
recognition.get_last_package_delay(),
))Paraformer
識別傳入麥克風的語音
即時語音辨識可以識別麥克風中傳入的語音並輸出識別結果,達到“邊說邊出文字”的效果。
Java
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.BackpressureStrategy;
import io.reactivex.Flowable;
import java.nio.ByteBuffer;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;
public class Main {
public static void main(String[] args) throws NoApiKeyException {
// 建立一個Flowable<ByteBuffer>
Flowable<ByteBuffer> audioSource = Flowable.create(emitter -> {
new Thread(() -> {
try {
// 建立音頻格式
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
// 根據格式匹配預設錄音裝置
TargetDataLine targetDataLine =
AudioSystem.getTargetDataLine(audioFormat);
targetDataLine.open(audioFormat);
// 開始錄音
targetDataLine.start();
ByteBuffer buffer = ByteBuffer.allocate(1024);
long start = System.currentTimeMillis();
// 錄音300s並進行即時轉寫
while (System.currentTimeMillis() - start < 300000) {
int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
if (read > 0) {
buffer.limit(read);
// 將錄音音頻資料發送給流式識別服務
emitter.onNext(buffer);
buffer = ByteBuffer.allocate(1024);
// 錄音速率有限,防止cpu佔用過高,休眠一小會兒
Thread.sleep(20);
}
}
// 通知結束轉寫
emitter.onComplete();
} catch (Exception e) {
emitter.onError(e);
}
}).start();
},
BackpressureStrategy.BUFFER);
// 建立Recognizer
Recognition recognizer = new Recognition();
// 建立RecognitionParam,audioFrames參數中傳入上面建立的Flowable<ByteBuffer>
RecognitionParam param = RecognitionParam.builder()
.model("paraformer-realtime-v2")
.format("pcm")
.sampleRate(16000)
// 若沒有將API Key配置到環境變數中,需將下面這行代碼注釋放開,並將apiKey替換為自己的API Key
// .apiKey("apikey")
.build();
// 流式調用介面
recognizer.streamCall(param, audioSource)
// 調用Flowable的subscribe方法訂閱結果
.blockingForEach(
result -> {
// 列印最終結果
if (result.isSentenceEnd()) {
System.out.println("Fix:" + result.getSentence().getText());
} else {
System.out.println("Result:" + result.getSentence().getText());
}
});
System.exit(0);
}
}Python
運行Python樣本前,需要通過pip install pyaudio命令安裝第三方音頻播放與採集套件。
import pyaudio
from dashscope.audio.asr import (Recognition, RecognitionCallback,
RecognitionResult)
# 若沒有將API Key配置到環境變數中,需將下面這行代碼注釋放開,並將apiKey替換為自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"
mic = None
stream = None
class Callback(RecognitionCallback):
def on_open(self) -> None:
global mic
global stream
print('RecognitionCallback open.')
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True)
def on_close(self) -> None:
global mic
global stream
print('RecognitionCallback close.')
stream.stop_stream()
stream.close()
mic.terminate()
stream = None
mic = None
def on_event(self, result: RecognitionResult) -> None:
print('RecognitionCallback sentence: ', result.get_sentence())
callback = Callback()
recognition = Recognition(model='paraformer-realtime-v2',
format='pcm',
sample_rate=16000,
callback=callback)
recognition.start()
while True:
if stream:
data = stream.read(3200, exception_on_overflow=False)
recognition.send_audio_frame(data)
else:
break
recognition.stop()識別本地音頻檔案
即時語音辨識可以識別本地音頻檔案並輸出識別結果。對於對話聊天、控制口令、語音輸入法、語音搜尋等較短的准即時語音辨識情境可考慮採用該介面進行語音辨識。
Java
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
public class Main {
public static void main(String[] args) {
// 使用者可忽略url下載檔案部分,可以直接使用本地檔案進行相關api調用進行識別
String exampleWavUrl =
"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav";
try {
InputStream in = new URL(exampleWavUrl).openStream();
Files.copy(in, Paths.get("asr_example.wav"), StandardCopyOption.REPLACE_EXISTING);
} catch (IOException e) {
System.out.println("error: " + e);
System.exit(1);
}
// 建立Recognition執行個體
Recognition recognizer = new Recognition();
// 建立RecognitionParam
RecognitionParam param =
RecognitionParam.builder()
// 若沒有將API Key配置到環境變數中,需將下面這行代碼注釋放開,並將apiKey替換為自己的API Key
// .apiKey("apikey")
.model("paraformer-realtime-v2")
.format("wav")
.sampleRate(16000)
// “language_hints”只支援paraformer-v2和paraformer-realtime-v2模型
.parameter("language_hints", new String[]{"zh", "en"})
.build();
try {
System.out.println("識別結果:" + recognizer.call(param, new File("asr_example.wav")));
} catch (Exception e) {
e.printStackTrace();
}
System.exit(0);
}
}Python
import requests
from http import HTTPStatus
from dashscope.audio.asr import Recognition
# 若沒有將API Key配置到環境變數中,需將下面這行代碼注釋放開,並將apiKey替換為自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"
# 使用者可忽略從url下載檔案這部分代碼,直接使用本地檔案進行識別
r = requests.get(
'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
)
with open('asr_example.wav', 'wb') as f:
f.write(r.content)
recognition = Recognition(model='paraformer-realtime-v2',
format='wav',
sample_rate=16000,
# “language_hints”只支援paraformer-v2和paraformer-realtime-v2模型
language_hints=['zh', 'en'],
callback=None)
result = recognition.call('asr_example.wav')
if result.status_code == HTTPStatus.OK:
print('識別結果:')
print(result.get_sentence())
else:
print('Error: ', result.message)應用於生產環境
提升識別效果
選擇正確採樣率的模型:8kHz 的電話音頻應直接使用 8kHz 模型,而不是升採樣到 16kHz 再識別,這樣可以避免資訊失真,獲得更佳效果。
使用熱詞功能:針對業務中的專有名詞、人名、品牌名等,配置熱詞可以顯著提升識別準確率,詳情請參見定製熱詞-Paraformer/Fun-ASR。
最佳化輸入音頻品質:盡量使用高品質的麥克風,並確保錄音環境信噪比高、無回聲。在應用程式層面,可以整合降噪(如RNNoise)、回聲消除(AEC)等演算法對音頻進行預先處理,以獲得更純淨的音頻。
明確指定識別語種:對於Paraformer-v2等支援多語種的模型,如果在調用時能預先確定音訊語種(如使用Language_hints參數指定語種為['zh','en']),可以協助模型收斂,避免在相似發音的語種間混淆,提升準確性。
語氣詞過濾:對於Paraformer模型,可以通過設定參數disfluency_removal_enabled開啟語氣詞過濾功能,獲得更書面、更易讀的文本結果。
設定容錯策略
用戶端重連:用戶端應實現斷線自動重連機制,以應對網路抖動。以Python SDK為例,您可以參考如下建議:
捕獲異常:在
Callback類中實現on_error方法。當dashscopeSDK遇到網路錯誤或其他問題時,會調用該方法。狀態通知:當
on_error被觸發時,設定重連訊號。在Python中可以使用threading.Event,它是一種安全執行緒的訊號標誌。重連迴圈:將主邏輯包裹在一個
for迴圈中(例如重試3次)。當檢測到重連訊號後,當前輪次的識別會中斷,清理資源,然後等待幾秒鐘,再次進入迴圈,建立一個全新的串連。
設定心跳防止串連斷開:當需要與服務端保持長串連時,可將參數heartbeat設定為
true,即使音頻中長時間沒有聲音,與服務端的串連也不會中斷。模型限流:在調用模型介面時請注意模型的限流規則。
API參考
模型功能特性對比
功能/特性 | fun-asr-realtime、fun-asr-realtime-2025-11-07 | fun-asr-realtime-2025-09-15 | paraformer-realtime-v2 | paraformer-realtime-v1 | paraformer-realtime-8k-v2 | paraformer-realtime-8k-v1 |
核心情境 | ApsaraVideo for Live、會議、三語教學等 | ApsaraVideo for Live、會議、雙語教學等 | 長語音流式識別(會議、直播) | 電話客服等 | ||
支援語言 | 中文(普通話、粵語、吳語、閩南語、客家話、贛語、湘語、晉語;並支援中原、西南、冀魯、江淮、蘭銀、膠遼、東北、北京、港台等,包括河南、陝西、湖北、四川、重慶、雲南、貴州、廣東、廣西、河北、天津、山東、安徽、南京、江蘇、杭州、甘肅、寧夏等地區官話口音)、英文、日語 | 中文(普通話)、英文 | 中文(普通話、粵語、吳語、閩南語、東北話、甘肅話、貴州話、河南話、湖北話、湖南話、寧夏話、山西話、陝西話、山東話、四川話、天津話、江西話、雲南話、上海話)、英文、日語、韓語、德語、法語、俄語 | 中文(普通話) | ||
支援的音頻格式 | pcm、wav、mp3、opus、speex、aac、amr | |||||
採樣率 | 16kHz | 任意採樣率 | 16kHz | 8kHz | ||
聲道 | 單聲道 | |||||
輸入形式 | 二進位音頻流 | |||||
音頻大小/時間長度 | 不限 | |||||
情感識別 | 預設開啟,可關閉 | |||||
敏感詞過濾 | ||||||
說話人分離 | ||||||
語氣詞過濾 | 預設關閉,可開啟 | |||||
時間戳記 | 固定開啟 | |||||
標點符號預測 | 固定開啟 | 預設開啟,可關閉 | 固定開啟 | 預設開啟,可關閉 | 固定開啟 | |
熱詞 | 可配置 | |||||
ITN | 固定開啟 | |||||
VAD | 固定開啟 | |||||
限流(RPS) | 20 | |||||
接入方式 | Java/Python SDK、WebSocket API | |||||
價格 | 國際:$0.00009/秒 中國內地:$0.000047/秒 | 中國內地:$0.000047/秒 | 中國內地:$0.000012/秒 | |||