Converts audio streams (microphone, meeting recordings, or local files) into text with punctuation in real time. Use it for meeting transcription, live captions, voice chat, or customer service.
Core features
Supports real-time recognition for multiple languages (Chinese, English, dialects).
Provides timestamp output for structured results.
Flexible sample rates and multiple audio formats adapt to different recording environments.
Voice Activity Detection (VAD) is optional and filters silent segments in long audio.
SDK and WebSocket connection types provide a low-latency, stable service.
Availability
Supported models:
International
In the international deployment mode, the endpoint and data storage are located in the Singapore region. Inference computing resources are dynamically scheduled worldwide, excluding the Chinese mainland.
Use an API key from the Singapore region:
Fun-ASR: fun-asr-realtime (stable, equivalent to fun-asr-realtime-2025-11-07), fun-asr-realtime-2025-11-07 (snapshot)
Chinese mainland
In the Chinese mainland deployment mode, the endpoint and data storage are located in the Beijing region. Inference computing resources are limited to the Chinese mainland.
Use an API key from the Beijing region:
Fun-ASR: fun-asr-realtime (stable, equivalent to fun-asr-realtime-2025-11-07), fun-asr-realtime-2026-02-28 (latest snapshot), fun-asr-realtime-2025-11-07 (snapshot), fun-asr-realtime-2025-09-15 (snapshot)
fun-asr-flash-8k-realtime (stable, equivalent to fun-asr-flash-8k-realtime-2026-01-28), fun-asr-flash-8k-realtime-2026-01-28
Paraformer: paraformer-realtime-v2, paraformer-realtime-v1, paraformer-realtime-8k-v2, paraformer-realtime-8k-v1
For more information, see the model list.
Model selection
Scenario | Recommended | Reason |
Mandarin Chinese recognition (meetings/live streaming) | fun-asr-realtime, fun-asr-realtime-2026-02-28, paraformer-realtime-v2 | Supports multiple formats, high sample rates, and stable latency. |
Multi-language recognition (international conferences) | paraformer-realtime-v2 | Covers multiple languages. |
Chinese dialect recognition (customer service/government affairs) | fun-asr-realtime-2026-02-28, paraformer-realtime-v2 | Covers multiple local dialects. |
Mixed Chinese, English, and Japanese recognition (classrooms/speeches) | fun-asr-realtime, fun-asr-realtime-2025-11-07 | Optimized for Chinese, English, and Japanese recognition. |
Low-bandwidth phone recording transcription | fun-asr-flash-8k-realtime | Specifically designed for Chinese-language telephone customer service. |
For more information, see Compare models.
Getting started
The following are code samples for calling the API. For more code samples for common scenarios, see GitHub.
Get an API key and export the API key as an environment variable. If you use an SDK to make calls, install the DashScope SDK.
Fun-ASR
Recognize speech from a microphone
This feature recognizes microphone input and displays results in real time ("speak-and-see").
Java
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;
import java.nio.ByteBuffer;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
public static void main(String[] args) throws InterruptedException {
// The following URL is for the Singapore region. If you use models in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(new RealtimeRecognitionTask());
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}
class RealtimeRecognitionTask implements Runnable {
@Override
public void run() {
RecognitionParam param = RecognitionParam.builder()
.model("fun-asr-realtime")
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.format("pcm")
.sampleRate(16000)
.build();
Recognition recognizer = new Recognition();
ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
@Override
public void onEvent(RecognitionResult result) {
if (result.isSentenceEnd()) {
System.out.println("Final Result: " + result.getSentence().getText());
} else {
System.out.println("Intermediate Result: " + result.getSentence().getText());
}
}
@Override
public void onComplete() {
System.out.println("Recognition complete");
}
@Override
public void onError(Exception e) {
System.out.println("RecognitionCallback error: " + e.getMessage());
}
};
try {
recognizer.call(param, callback);
// Create audio format
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
// Match the default recording device based on the format
TargetDataLine targetDataLine =
AudioSystem.getTargetDataLine(audioFormat);
targetDataLine.open(audioFormat);
// Start recording
targetDataLine.start();
ByteBuffer buffer = ByteBuffer.allocate(1024);
long start = System.currentTimeMillis();
// Record for 50s and perform real-time transcription
while (System.currentTimeMillis() - start < 50000) {
int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
if (read > 0) {
buffer.limit(read);
// Send recorded audio data to the streaming recognition service
recognizer.sendAudioFrame(buffer);
buffer = ByteBuffer.allocate(1024);
// Sleep briefly to prevent high CPU usage due to limited recording speed
Thread.sleep(20);
}
}
recognizer.stop();
} catch (Exception e) {
e.printStackTrace();
} finally {
// Close the Websocket connection after the task ends
recognizer.getDuplexApi().close(1000, "bye");
}
System.out.println(
"[Metric] requestId: "
+ recognizer.getLastRequestId()
+ ", first package delay ms: "
+ recognizer.getFirstPackageDelay()
+ ", last package delay ms: "
+ recognizer.getLastPackageDelay());
}
}Python
Before running the Python example, run pip install pyaudio
import os
import signal # for keyboard events handling (press "Ctrl+C" to terminate recording)
import sys
import dashscope
import pyaudio
from dashscope.audio.asr import *
mic = None
stream = None
# Set recording parameters
sample_rate = 16000 # sampling rate (Hz)
channels = 1 # mono channel
dtype = 'int16' # data type
format_pcm = 'pcm' # the format of the audio data
block_size = 3200 # number of frames per buffer
# Real-time speech recognition callback
class Callback(RecognitionCallback):
def on_open(self) -> None:
global mic
global stream
print('RecognitionCallback open.')
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True)
def on_close(self) -> None:
global mic
global stream
print('RecognitionCallback close.')
stream.stop_stream()
stream.close()
mic.terminate()
stream = None
mic = None
def on_complete(self) -> None:
print('RecognitionCallback completed.') # recognition completed
def on_error(self, message) -> None:
print('RecognitionCallback task_id: ', message.request_id)
print('RecognitionCallback error: ', message.message)
# Stop and close the audio stream if it is running
if 'stream' in globals() and stream.active:
stream.stop()
stream.close()
# Forcefully exit the program
sys.exit(1)
def on_event(self, result: RecognitionResult) -> None:
sentence = result.get_sentence()
if 'text' in sentence:
print('RecognitionCallback text: ', sentence['text'])
if RecognitionResult.is_sentence_end(sentence):
print(
'RecognitionCallback sentence end, request_id:%s, usage:%s'
% (result.get_request_id(), result.get_usage(sentence)))
def signal_handler(sig, frame):
print('Ctrl+C pressed, stop recognition ...')
# Stop recognition
recognition.stop()
print('Recognition stopped.')
print(
'[Metric] requestId: {}, first package delay ms: {}, last package delay ms: {}'
.format(
recognition.get_last_request_id(),
recognition.get_first_package_delay(),
recognition.get_last_package_delay(),
))
# Forcefully exit the program
sys.exit(0)
# main function
if __name__ == '__main__':
# API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not set an environment variable, replace the next line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
# The following URL is for the Singapore region. To use models in the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'
# Create the recognition callback
callback = Callback()
# Call recognition service by async mode, you can customize the recognition parameters, like model, format,
# sample_rate
recognition = Recognition(
model='fun-asr-realtime',
format=format_pcm,
# 'pcm'、'wav'、'opus'、'speex'、'aac'、'amr', you can check the supported formats in the document
sample_rate=sample_rate,
# support 8000, 16000
semantic_punctuation_enabled=False,
callback=callback)
# Start recognition
recognition.start()
signal.signal(signal.SIGINT, signal_handler)
print("Press 'Ctrl+C' to stop recording and recognition...")
# Create a keyboard listener until "Ctrl+C" is pressed
while True:
if stream:
data = stream.read(3200, exception_on_overflow=False)
recognition.send_audio_frame(data)
else:
break
recognition.stop()Recognize a local audio file
This feature recognizes and transcribes local audio files. It is ideal for short audio scenarios like voice chats, commands, voice input, or voice search.
Java
Audio used in the example: asr_example.wav.
import com.alibaba.dashscope.api.GeneralApi;
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.base.HalfDuplexParamBase;
import com.alibaba.dashscope.common.GeneralListParam;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.protocol.GeneralServiceOption;
import com.alibaba.dashscope.protocol.HttpMethod;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.protocol.StreamingMode;
import com.alibaba.dashscope.utils.Constants;
import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
class TimeUtils {
private static final DateTimeFormatter formatter =
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
public static String getTimestamp() {
return LocalDateTime.now().format(formatter);
}
}
public class Main {
public static void main(String[] args) throws InterruptedException {
// The following URL is for the Singapore region. If you use models in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
// In actual applications, run this method only once at program startup.
warmUp();
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(new RealtimeRecognitionTask(Paths.get(System.getProperty("user.dir"), "asr_example.wav")));
executorService.shutdown();
// wait for all tasks to complete
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
public static void warmUp() {
try {
// Lightweight GET request to establish connection
GeneralServiceOption warmupOption = GeneralServiceOption.builder()
.protocol(Protocol.HTTP)
.httpMethod(HttpMethod.GET)
.streamingMode(StreamingMode.OUT)
.path("assistants")
.build();
warmupOption.setBaseHttpUrl(Constants.baseHttpApiUrl);
GeneralApi<HalfDuplexParamBase> api = new GeneralApi<>();
api.get(GeneralListParam.builder().limit(1L).build(), warmupOption);
} catch (Exception e) {
// Reset flag to allow retry if pre-warming failed
}
}
}
class RealtimeRecognitionTask implements Runnable {
private Path filepath;
public RealtimeRecognitionTask(Path filepath) {
this.filepath = filepath;
}
@Override
public void run() {
RecognitionParam param = RecognitionParam.builder()
.model("fun-asr-realtime")
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.format("wav")
.sampleRate(16000)
.build();
Recognition recognizer = new Recognition();
String threadName = Thread.currentThread().getName();
ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
@Override
public void onEvent(RecognitionResult message) {
if (message.isSentenceEnd()) {
System.out.println(TimeUtils.getTimestamp()+" "+
"[process " + threadName + "] Final Result:" + message.getSentence().getText());
} else {
System.out.println(TimeUtils.getTimestamp()+" "+
"[process " + threadName + "] Intermediate Result: " + message.getSentence().getText());
}
}
@Override
public void onComplete() {
System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Recognition complete");
}
@Override
public void onError(Exception e) {
System.out.println(TimeUtils.getTimestamp()+" "+
"[" + threadName + "] RecognitionCallback error: " + e.getMessage());
}
};
try {
recognizer.call(param, callback);
// Please replace the path with your audio file path
System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Input file_path is: " + this.filepath);
// Read file and send audio by chunks
FileInputStream fis = new FileInputStream(this.filepath.toFile());
byte[] allData = new byte[fis.available()];
int ret = fis.read(allData);
fis.close();
int sendFrameLength = 3200;
for (int i = 0; i * sendFrameLength < allData.length; i ++) {
int start = i * sendFrameLength;
int end = Math.min(start + sendFrameLength, allData.length);
ByteBuffer byteBuffer = ByteBuffer.wrap(allData, start, end - start);
recognizer.sendAudioFrame(byteBuffer);
Thread.sleep(100);
}
System.out.println(TimeUtils.getTimestamp()+" "+LocalDateTime.now());
recognizer.stop();
} catch (Exception e) {
e.printStackTrace();
} finally {
// Close the Websocket connection after the task ends
recognizer.getDuplexApi().close(1000, "bye");
}
System.out.println(
"["
+ threadName
+ "][Metric] requestId: "
+ recognizer.getLastRequestId()
+ ", first package delay ms: "
+ recognizer.getFirstPackageDelay()
+ ", last package delay ms: "
+ recognizer.getLastPackageDelay());
}
}Python
The audio file used in this example is asr_example.wav.
import os
import time
import dashscope
from dashscope.audio.asr import *
# API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not set an environment variable, replace the next line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
# The following URL is for the Singapore region. To use models in the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'
from datetime import datetime
def get_timestamp():
now = datetime.now()
formatted_timestamp = now.strftime("[%Y-%m-%d %H:%M:%S.%f]")
return formatted_timestamp
class Callback(RecognitionCallback):
def on_complete(self) -> None:
print(get_timestamp() + ' Recognition completed') # recognition complete
def on_error(self, result: RecognitionResult) -> None:
print('Recognition task_id: ', result.request_id)
print('Recognition error: ', result.message)
exit(0)
def on_event(self, result: RecognitionResult) -> None:
sentence = result.get_sentence()
if 'text' in sentence:
print(get_timestamp() + ' RecognitionCallback text: ', sentence['text'])
if RecognitionResult.is_sentence_end(sentence):
print(get_timestamp() +
'RecognitionCallback sentence end, request_id:%s, usage:%s'
% (result.get_request_id(), result.get_usage(sentence)))
callback = Callback()
recognition = Recognition(model='fun-asr-realtime',
format='wav',
sample_rate=16000,
callback=callback)
try:
audio_data: bytes = None
f = open("asr_example.wav", 'rb')
if os.path.getsize("asr_example.wav"):
# Read all file data into buffer at once
file_buffer = f.read()
f.close()
print("Start Recognition")
recognition.start()
# Send data from buffer in chunks of 3200 bytes
buffer_size = len(file_buffer)
offset = 0
chunk_size = 3200
while offset < buffer_size:
# Calculate size of current chunk
remaining_bytes = buffer_size - offset
current_chunk_size = min(chunk_size, remaining_bytes)
# Extract current chunk from buffer
audio_data = file_buffer[offset:offset + current_chunk_size]
# Send audio frame
recognition.send_audio_frame(audio_data)
# Update offset
offset += current_chunk_size
# Add delay to simulate real-time transmission
time.sleep(0.1)
recognition.stop()
else:
raise Exception(
'The supplied file was empty (zero bytes long)')
except Exception as e:
raise e
print(
'[Metric] requestId: {}, first package delay ms: {}, last package delay ms: {}'
.format(
recognition.get_last_request_id(),
recognition.get_first_package_delay(),
recognition.get_last_package_delay(),
))Paraformer
Recognize speech from a microphone
This feature recognizes microphone input and displays results in real time ("speak-and-see").
Java
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.BackpressureStrategy;
import io.reactivex.Flowable;
import java.nio.ByteBuffer;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;
public class Main {
public static void main(String[] args) throws NoApiKeyException {
// Create a Flowable<ByteBuffer>.
Flowable<ByteBuffer> audioSource = Flowable.create(emitter -> {
new Thread(() -> {
try {
// Create an audio format.
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
// Match the default recording device based on the format.
TargetDataLine targetDataLine =
AudioSystem.getTargetDataLine(audioFormat);
targetDataLine.open(audioFormat);
// Start recording.
targetDataLine.start();
ByteBuffer buffer = ByteBuffer.allocate(1024);
long start = System.currentTimeMillis();
// Record for 300s and perform real-time transcription.
while (System.currentTimeMillis() - start < 300000) {
int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
if (read > 0) {
buffer.limit(read);
// Send the recorded audio data to the streaming recognition service.
emitter.onNext(buffer);
buffer = ByteBuffer.allocate(1024);
// The recording rate is limited. Sleep for a short time to prevent high CPU usage.
Thread.sleep(20);
}
}
// Notify the end of transcription.
emitter.onComplete();
} catch (Exception e) {
emitter.onError(e);
}
}).start();
},
BackpressureStrategy.BUFFER);
// Create a Recognizer.
Recognition recognizer = new Recognition();
// Create a RecognitionParam and pass the created Flowable<ByteBuffer> in the audioFrames parameter.
RecognitionParam param = RecognitionParam.builder()
.model("paraformer-realtime-v2")
.format("pcm")
.sampleRate(16000)
// If you have not configured the API key as an environment variable, uncomment the following line of code and replace apiKey with your own API key.
// .apiKey("apikey")
.build();
// Call the interface in a streaming fashion.
recognizer.streamCall(param, audioSource)
// Call the subscribe method of Flowable to subscribe to the results.
.blockingForEach(
result -> {
// Print the final result.
if (result.isSentenceEnd()) {
System.out.println("Fix:" + result.getSentence().getText());
} else {
System.out.println("Result:" + result.getSentence().getText());
}
});
System.exit(0);
}
}Python
Before running the Python example, run pip install pyaudio
import pyaudio
from dashscope.audio.asr import (Recognition, RecognitionCallback,
RecognitionResult)
# If you have not configured the API key as an environment variable, uncomment the following line of code and replace apiKey with your own API key.
# import dashscope
# dashscope.api_key = "apiKey"
mic = None
stream = None
class Callback(RecognitionCallback):
def on_open(self) -> None:
global mic
global stream
print('RecognitionCallback open.')
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True)
def on_close(self) -> None:
global mic
global stream
print('RecognitionCallback close.')
stream.stop_stream()
stream.close()
mic.terminate()
stream = None
mic = None
def on_event(self, result: RecognitionResult) -> None:
print('RecognitionCallback sentence: ', result.get_sentence())
callback = Callback()
recognition = Recognition(model='paraformer-realtime-v2',
format='pcm',
sample_rate=16000,
callback=callback)
recognition.start()
while True:
if stream:
data = stream.read(3200, exception_on_overflow=False)
recognition.send_audio_frame(data)
else:
break
recognition.stop()Recognize a local audio file
This feature recognizes and transcribes local audio files. It is ideal for short audio scenarios like voice chats, commands, voice input, or voice search.
Java
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
public class Main {
public static void main(String[] args) {
// You can ignore the file download from the URL. Use a local file to call the API for recognition.
String exampleWavUrl =
"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav";
try {
InputStream in = new URL(exampleWavUrl).openStream();
Files.copy(in, Paths.get("asr_example.wav"), StandardCopyOption.REPLACE_EXISTING);
} catch (IOException e) {
System.out.println("error: " + e);
System.exit(1);
}
// Create a Recognition instance.
Recognition recognizer = new Recognition();
// Create a RecognitionParam.
RecognitionParam param =
RecognitionParam.builder()
// If you have not configured the API key as an environment variable, uncomment the following line of code and replace apiKey with your own API key.
// .apiKey("apikey")
.model("paraformer-realtime-v2")
.format("wav")
.sampleRate(16000)
// "language_hints" is only supported by the paraformer-v2 and paraformer-realtime-v2 models.
.parameter("language_hints", new String[]{"zh", "en"})
.build();
try {
System.out.println("Recognition result: " + recognizer.call(param, new File("asr_example.wav")));
} catch (Exception e) {
e.printStackTrace();
}
System.exit(0);
}
}Python
import requests
from http import HTTPStatus
from dashscope.audio.asr import Recognition
# If you have not configured the API key as an environment variable, uncomment the following line of code and replace apiKey with your own API key.
# import dashscope
# dashscope.api_key = "apiKey"
# You can ignore the file download from the URL. Use a local file for recognition.
r = requests.get(
'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
)
with open('asr_example.wav', 'wb') as f:
f.write(r.content)
recognition = Recognition(model='paraformer-realtime-v2',
format='wav',
sample_rate=16000,
# "language_hints" is only supported by the paraformer-v2 and paraformer-realtime-v2 models.
language_hints=['zh', 'en'],
callback=None)
result = recognition.call('asr_example.wav')
if result.status_code == HTTPStatus.OK:
print('Recognition result:')
print(result.get_sentence())
else:
print('Error: ', result.message)Going live
Improve recognition accuracy
Select a model with the correct sample rate: For 8 kHz telephone audio, use an 8 kHz model directly because upsampling to 16 kHz causes distortion.
Optimize input audio quality: Use high-quality microphones with high SNR and no echo. At the application level, integrate noise reduction (like RNNoise) or acoustic echo cancellation (AEC) to preprocess audio.
Specify the recognition language: For multilingual models like Paraformer-v2, use the Language_hints parameter (e.g., ['zh','en']) to help the model converge and avoid confusion between similar-sounding languages.
Filter disfluent words: For Paraformer, set the disfluency_removal_enabled parameter to produce more formal, readable text.
Set a fault tolerance policy
Client-side reconnection: Implement automatic reconnection to handle network jitter. For the Python SDK:
Catch exceptions: Implement
on_errorin theCallbackclass. ThedashscopeSDK calls this method when network errors occur.Notify status: When
on_errortriggers, set a reconnection signal using the thread-safethreading.Eventflag.Reconnection loop: Wrap the main logic in a
forloop to retry up to 3 times. When the signal triggers, interrupt recognition, clean up resources, wait a few seconds, and reconnect.
Set a heartbeat to prevent connection loss: Set the heartbeat parameter to
trueto maintain the connection during long silence periods.Rate limiting: Follow the model's rate limiting rules.
API reference
Compare models
Features | Fun-ASR | Paraformer |
Supported languages | Varies by model:
| Varies by model:
|
Supported audio formats | pcm, wav, mp3, opus, speex, aac, amr | |
Sample rate | Varies by model:
| Varies by model:
|
Sound channel | Mono | |
Input format | Binary audio stream | |
Audio size/duration | Unlimited | Unlimited |
Emotion recognition | Varies by model:
| |
Sensitive words filter | ||
Speaker diarization | ||
Modal particle filtering | Disabled by default, can be enabled | |
Timestamp | Always enabled | |
Punctuation prediction | Always enabled | Varies by model:
|
Hotwords | ||
ITN | Always enabled | |
VAD | Always enabled | |
Rate limit (RPS) | 20 | 20 |
Connection type | Java/Python/Android/iOS SDK, WebSocket API | |
Price | Varies by model:
| Chinese mainland: $0.000012/second |