全部产品
Search
文档中心

大模型服务平台百炼:实时语音识别-Fun-ASR/Paraformer

更新时间:Feb 12, 2026

实时语音识别服务可将音频流实时转换为带标点的文本,实现“边说边出文字”的效果。无论是麦克风语音、会议录音还是本地音频文件,都能轻松转录。服务广泛应用于会议实时记录、直播字幕、语音聊天、智能客服等场景。

核心功能

  • 支持多语种实时语音识别,覆盖中英文及多种方言

  • 支持热词定制,可提升特定词汇的识别准确率

  • 支持时间戳输出,生成结构化识别结果

  • 灵活采样率与多种音频格式,适配不同录音环境

  • 可选VAD(Voice Activity Detection),自动过滤静音片段,提升长音频处理效率

  • SDK + WebSocket 接入,低延迟稳定服务

适用范围

支持的模型:

国际

国际部署模式下,接入点与数据存储均位于新加坡地域,模型推理计算资源在全球范围内动态调度(不含中国内地)。

调用以下模型时,请选择新加坡地域的API Key

Fun-ASR:fun-asr-realtime(稳定版,当前等同fun-asr-realtime-2025-11-07)、fun-asr-realtime-2025-11-07(快照版)

中国内地

中国内地部署模式下,接入点与数据存储均位于北京地域,模型推理计算资源仅限于中国内地。

调用以下模型时,请选择北京地域的API Key

  • Fun-ASR:fun-asr-realtime(稳定版,当前等同fun-asr-realtime-2025-11-07)、fun-asr-realtime-2025-11-07(快照版)、fun-asr-realtime-2025-09-15(快照版)、fun-asr-flash-8k-realtime(稳定版,当前等同fun-asr-flash-8k-realtime-2026-01-28)、fun-asr-flash-8k-realtime-2026-01-28

  • Paraformer:paraformer-realtime-v2、paraformer-realtime-v1、paraformer-realtime-8k-v2、paraformer-realtime-8k-v1

更多信息请参见模型列表

模型选型

场景

推荐模型

理由

中文普通话识别(会议/直播)

fun-asr-realtime、fun-asr-realtime-2025-11-07、paraformer-realtime-v2

多格式兼容,高采样率支持,稳定延迟

多语种识别(国际会议)

paraformer-realtime-v2

覆盖多语种

中文方言识别(客服/政务)

fun-asr-realtime-2025-11-07、paraformer-realtime-v2

覆盖多地方言

中英日混合识别(课堂/演讲)

fun-asr-realtime、fun-asr-realtime-2025-11-07

中英日识别优化

低带宽电话录音转写

fun-asr-flash-8k-realtime

专为中文电话客服场景设计

热词定制场景(品牌名/专有术语)

Paraformer、Fun-ASR最新版本模型

热词可开关,易于迭代配置

更多说明请参见模型功能特性对比

快速开始

下面是调用API的示例代码。更多常用场景的代码示例,请参见GitHub

您需要已获取API Key配置API Key到环境变量。如果通过SDK调用,还需要安装DashScope SDK

Fun-ASR

识别传入麦克风的语音

实时语音识别可以识别麦克风中传入的语音并输出识别结果,达到“边说边出文字”的效果。

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;

import java.nio.ByteBuffer;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {
    public static void main(String[] args) throws InterruptedException {
        // 以下为新加坡地域url,若使用北京地域的模型,需将url替换为:wss://dashscope.aliyuncs.com/api-ws/v1/inference
        Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
        ExecutorService executorService = Executors.newSingleThreadExecutor();
        executorService.submit(new RealtimeRecognitionTask());
        executorService.shutdown();
        executorService.awaitTermination(1, TimeUnit.MINUTES);
        System.exit(0);
    }
}

class RealtimeRecognitionTask implements Runnable {
    @Override
    public void run() {
        RecognitionParam param = RecognitionParam.builder()
                .model("fun-asr-realtime")
                // 新加坡地域和北京地域的API Key不同。获取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // 若没有配置环境变量,请用百炼API Key将下行替换为:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .format("pcm")
                .sampleRate(16000)
                .build();
        Recognition recognizer = new Recognition();

        ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
            @Override
            public void onEvent(RecognitionResult result) {
                if (result.isSentenceEnd()) {
                    System.out.println("Final Result: " + result.getSentence().getText());
                } else {
                    System.out.println("Intermediate Result: " + result.getSentence().getText());
                }
            }

            @Override
            public void onComplete() {
                System.out.println("Recognition complete");
            }

            @Override
            public void onError(Exception e) {
                System.out.println("RecognitionCallback error: " + e.getMessage());
            }
        };
        try {
            recognizer.call(param, callback);
            // 创建音频格式
            AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
            // 根据格式匹配默认录音设备
            TargetDataLine targetDataLine =
                    AudioSystem.getTargetDataLine(audioFormat);
            targetDataLine.open(audioFormat);
            // 开始录音
            targetDataLine.start();
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            long start = System.currentTimeMillis();
            // 录音50s并进行实时转写
            while (System.currentTimeMillis() - start < 50000) {
                int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                if (read > 0) {
                    buffer.limit(read);
                    // 将录音音频数据发送给流式识别服务
                    recognizer.sendAudioFrame(buffer);
                    buffer = ByteBuffer.allocate(1024);
                    // 录音速率有限,防止cpu占用过高,休眠一小会儿
                    Thread.sleep(20);
                }
            }
            recognizer.stop();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            // 任务结束后关闭 Websocket 连接
            recognizer.getDuplexApi().close(1000, "bye");
        }

        System.out.println(
                "[Metric] requestId: "
                        + recognizer.getLastRequestId()
                        + ", first package delay ms: "
                        + recognizer.getFirstPackageDelay()
                        + ", last package delay ms: "
                        + recognizer.getLastPackageDelay());
    }
}

Python

运行Python示例前,需要通过pip install pyaudio命令安装第三方音频播放与采集套件。

import os
import signal  # for keyboard events handling (press "Ctrl+C" to terminate recording)
import sys

import dashscope
import pyaudio
from dashscope.audio.asr import *

mic = None
stream = None

# Set recording parameters
sample_rate = 16000  # sampling rate (Hz)
channels = 1  # mono channel
dtype = 'int16'  # data type
format_pcm = 'pcm'  # the format of the audio data
block_size = 3200  # number of frames per buffer


# Real-time speech recognition callback
class Callback(RecognitionCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print('RecognitionCallback open.')
        mic = pyaudio.PyAudio()
        stream = mic.open(format=pyaudio.paInt16,
                          channels=1,
                          rate=16000,
                          input=True)

    def on_close(self) -> None:
        global mic
        global stream
        print('RecognitionCallback close.')
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_complete(self) -> None:
        print('RecognitionCallback completed.')  # recognition completed

    def on_error(self, message) -> None:
        print('RecognitionCallback task_id: ', message.request_id)
        print('RecognitionCallback error: ', message.message)
        # Stop and close the audio stream if it is running
        if 'stream' in globals() and stream.active:
            stream.stop()
            stream.close()
        # Forcefully exit the program
        sys.exit(1)

    def on_event(self, result: RecognitionResult) -> None:
        sentence = result.get_sentence()
        if 'text' in sentence:
            print('RecognitionCallback text: ', sentence['text'])
            if RecognitionResult.is_sentence_end(sentence):
                print(
                    'RecognitionCallback sentence end, request_id:%s, usage:%s'
                    % (result.get_request_id(), result.get_usage(sentence)))


def signal_handler(sig, frame):
    print('Ctrl+C pressed, stop recognition ...')
    # Stop recognition
    recognition.stop()
    print('Recognition stopped.')
    print(
        '[Metric] requestId: {}, first package delay ms: {}, last package delay ms: {}'
        .format(
            recognition.get_last_request_id(),
            recognition.get_first_package_delay(),
            recognition.get_last_package_delay(),
        ))
    # Forcefully exit the program
    sys.exit(0)


# main function
if __name__ == '__main__':
    # 新加坡地域和北京地域的API Key不同。获取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # 若没有配置环境变量,请用百炼API Key将下行替换为:dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

    # 以下为新加坡地域url,若使用北京地域的模型,需将url替换为:wss://dashscope.aliyuncs.com/api-ws/v1/inference
    dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'

    # Create the recognition callback
    callback = Callback()

    # Call recognition service by async mode, you can customize the recognition parameters, like model, format,
    # sample_rate
    recognition = Recognition(
        model='fun-asr-realtime',
        format=format_pcm,
        # 'pcm'、'wav'、'opus'、'speex'、'aac'、'amr', you can check the supported formats in the document
        sample_rate=sample_rate,
        # support 8000, 16000
        semantic_punctuation_enabled=False,
        callback=callback)

    # Start recognition
    recognition.start()

    signal.signal(signal.SIGINT, signal_handler)
    print("Press 'Ctrl+C' to stop recording and recognition...")
    # Create a keyboard listener until "Ctrl+C" is pressed

    while True:
        if stream:
            data = stream.read(3200, exception_on_overflow=False)
            recognition.send_audio_frame(data)
        else:
            break

    recognition.stop()

识别本地音频文件

实时语音识别可以识别本地音频文件并输出识别结果。对于对话聊天、控制口令、语音输入法、语音搜索等较短的准实时语音识别场景可考虑采用该接口进行语音识别。

Java

示例中用到的音频为:asr_example.wav

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;

import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

class TimeUtils {
    private static final DateTimeFormatter formatter =
            DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");

    public static String getTimestamp() {
        return LocalDateTime.now().format(formatter);
    }
}

public class Main {
    public static void main(String[] args) throws InterruptedException {
        // 以下为新加坡地域url,若使用北京地域的模型,需将url替换为:wss://dashscope.aliyuncs.com/api-ws/v1/inference
        Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
        ExecutorService executorService = Executors.newSingleThreadExecutor();
        executorService.submit(new RealtimeRecognitionTask(Paths.get(System.getProperty("user.dir"), "asr_example.wav")));
        executorService.shutdown();

        // wait for all tasks to complete
        executorService.awaitTermination(1, TimeUnit.MINUTES);
        System.exit(0);
    }
}

class RealtimeRecognitionTask implements Runnable {
    private Path filepath;

    public RealtimeRecognitionTask(Path filepath) {
        this.filepath = filepath;
    }

    @Override
    public void run() {
        RecognitionParam param = RecognitionParam.builder()
                .model("fun-asr-realtime")
                // 新加坡地域和北京地域的API Key不同。获取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // 若没有配置环境变量,请用百炼API Key将下行替换为:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .format("wav")
                .sampleRate(16000)
                .build();
        Recognition recognizer = new Recognition();

        String threadName = Thread.currentThread().getName();

        ResultCallback<RecognitionResult> callback = new ResultCallback<RecognitionResult>() {
            @Override
            public void onEvent(RecognitionResult message) {
                if (message.isSentenceEnd()) {

                    System.out.println(TimeUtils.getTimestamp()+" "+
                            "[process " + threadName + "] Final Result:" + message.getSentence().getText());
                } else {
                    System.out.println(TimeUtils.getTimestamp()+" "+
                            "[process " + threadName + "] Intermediate Result: " + message.getSentence().getText());
                }
            }

            @Override
            public void onComplete() {
                System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Recognition complete");
            }

            @Override
            public void onError(Exception e) {
                System.out.println(TimeUtils.getTimestamp()+" "+
                        "[" + threadName + "] RecognitionCallback error: " + e.getMessage());
            }
        };

        try {
            recognizer.call(param, callback);
            // Please replace the path with your audio file path
            System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] Input file_path is: " + this.filepath);
            // Read file and send audio by chunks
            FileInputStream fis = new FileInputStream(this.filepath.toFile());
            // chunk size set to 1 seconds for 16KHz sample rate
            byte[] buffer = new byte[3200];
            int bytesRead;
            // Loop to read chunks of the file
            while ((bytesRead = fis.read(buffer)) != -1) {
                ByteBuffer byteBuffer;
                // Handle the last chunk which might be smaller than the buffer size
                System.out.println(TimeUtils.getTimestamp()+" "+"[" + threadName + "] bytesRead: " + bytesRead);
                if (bytesRead < buffer.length) {
                    byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
                } else {
                    byteBuffer = ByteBuffer.wrap(buffer);
                }

                recognizer.sendAudioFrame(byteBuffer);
                buffer = new byte[3200];
                Thread.sleep(100);
            }
            System.out.println(TimeUtils.getTimestamp()+" "+LocalDateTime.now());
            recognizer.stop();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            // 任务结束后关闭 Websocket 连接
            recognizer.getDuplexApi().close(1000, "bye");
        }

        System.out.println(
                "["
                        + threadName
                        + "][Metric] requestId: "
                        + recognizer.getLastRequestId()
                        + ", first package delay ms: "
                        + recognizer.getFirstPackageDelay()
                        + ", last package delay ms: "
                        + recognizer.getLastPackageDelay());
    }
}

Python

示例中用到的音频为:asr_example.wav

import os
import time
import dashscope
from dashscope.audio.asr import *

# 新加坡地域和北京地域的API Key不同。获取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 若没有配置环境变量,请用百炼API Key将下行替换为:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

# 以下为新加坡地域url,若使用北京地域的模型,需将url替换为:wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'

from datetime import datetime

def get_timestamp():
    now = datetime.now()
    formatted_timestamp = now.strftime("[%Y-%m-%d %H:%M:%S.%f]")
    return formatted_timestamp

class Callback(RecognitionCallback):
    def on_complete(self) -> None:
        print(get_timestamp() + ' Recognition completed')  # recognition complete

    def on_error(self, result: RecognitionResult) -> None:
        print('Recognition task_id: ', result.request_id)
        print('Recognition error: ', result.message)
        exit(0)

    def on_event(self, result: RecognitionResult) -> None:
        sentence = result.get_sentence()
        if 'text' in sentence:
            print(get_timestamp() + ' RecognitionCallback text: ', sentence['text'])
            if RecognitionResult.is_sentence_end(sentence):
                print(get_timestamp() + 
                    'RecognitionCallback sentence end, request_id:%s, usage:%s'
                    % (result.get_request_id(), result.get_usage(sentence)))


callback = Callback()

recognition = Recognition(model='fun-asr-realtime',
                          format='wav',
                          sample_rate=16000,
                          callback=callback)

recognition.start()

try:
    audio_data: bytes = None
    f = open("asr_example.wav", 'rb')
    if os.path.getsize("asr_example.wav"):
        while True:
            audio_data = f.read(3200)
            if not audio_data:
                break
            else:
                recognition.send_audio_frame(audio_data)
            time.sleep(0.1)
    else:
        raise Exception(
            'The supplied file was empty (zero bytes long)')
    f.close()
except Exception as e:
    raise e

recognition.stop()

print(
    '[Metric] requestId: {}, first package delay ms: {}, last package delay ms: {}'
    .format(
        recognition.get_last_request_id(),
        recognition.get_first_package_delay(),
        recognition.get_last_package_delay(),
    ))

Paraformer

识别传入麦克风的语音

实时语音识别可以识别麦克风中传入的语音并输出识别结果,达到“边说边出文字”的效果。

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.BackpressureStrategy;
import io.reactivex.Flowable;

import java.nio.ByteBuffer;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;

public class Main {

    public static void main(String[] args) throws NoApiKeyException {
        // 创建一个Flowable<ByteBuffer>
        Flowable<ByteBuffer> audioSource = Flowable.create(emitter -> {
            new Thread(() -> {
                try {
                    // 创建音频格式
                    AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
                    // 根据格式匹配默认录音设备
                    TargetDataLine targetDataLine =
                            AudioSystem.getTargetDataLine(audioFormat);
                    targetDataLine.open(audioFormat);
                    // 开始录音
                    targetDataLine.start();
                    ByteBuffer buffer = ByteBuffer.allocate(1024);
                    long start = System.currentTimeMillis();
                    // 录音300s并进行实时转写
                    while (System.currentTimeMillis() - start < 300000) {
                        int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                        if (read > 0) {
                            buffer.limit(read);
                            // 将录音音频数据发送给流式识别服务
                            emitter.onNext(buffer);
                            buffer = ByteBuffer.allocate(1024);
                            // 录音速率有限,防止cpu占用过高,休眠一小会儿
                            Thread.sleep(20);
                        }
                    }
                    // 通知结束转写
                    emitter.onComplete();
                } catch (Exception e) {
                    emitter.onError(e);
                }
            }).start();
        },
        BackpressureStrategy.BUFFER);

        // 创建Recognizer
        Recognition recognizer = new Recognition();
        // 创建RecognitionParam,audioFrames参数中传入上面创建的Flowable<ByteBuffer>
        RecognitionParam param = RecognitionParam.builder()
            .model("paraformer-realtime-v2")
            .format("pcm")
            .sampleRate(16000)
            // 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
            // .apiKey("apikey")
            .build();

        // 流式调用接口
        recognizer.streamCall(param, audioSource)
            // 调用Flowable的subscribe方法订阅结果
            .blockingForEach(
                result -> {
                    // 打印最终结果
                    if (result.isSentenceEnd()) {
                        System.out.println("Fix:" + result.getSentence().getText());
                    } else {
                        System.out.println("Result:" + result.getSentence().getText());
                    }
                });
        System.exit(0);
    }
}

Python

运行Python示例前,需要通过pip install pyaudio命令安装第三方音频播放与采集套件。

import pyaudio
from dashscope.audio.asr import (Recognition, RecognitionCallback,
                                 RecognitionResult)

# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"

mic = None
stream = None


class Callback(RecognitionCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print('RecognitionCallback open.')
        mic = pyaudio.PyAudio()
        stream = mic.open(format=pyaudio.paInt16,
                          channels=1,
                          rate=16000,
                          input=True)

    def on_close(self) -> None:
        global mic
        global stream
        print('RecognitionCallback close.')
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_event(self, result: RecognitionResult) -> None:
        print('RecognitionCallback sentence: ', result.get_sentence())


callback = Callback()
recognition = Recognition(model='paraformer-realtime-v2',
                          format='pcm',
                          sample_rate=16000,
                          callback=callback)
recognition.start()

while True:
    if stream:
        data = stream.read(3200, exception_on_overflow=False)
        recognition.send_audio_frame(data)
    else:
        break

recognition.stop()

识别本地音频文件

实时语音识别可以识别本地音频文件并输出识别结果。对于对话聊天、控制口令、语音输入法、语音搜索等较短的准实时语音识别场景可考虑采用该接口进行语音识别。

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;

public class Main {
    public static void main(String[] args) {
        // 用户可忽略url下载文件部分,可以直接使用本地文件进行相关api调用进行识别
        String exampleWavUrl =
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav";
        try {
            InputStream in = new URL(exampleWavUrl).openStream();
            Files.copy(in, Paths.get("asr_example.wav"), StandardCopyOption.REPLACE_EXISTING);
        } catch (IOException e) {
            System.out.println("error: " + e);
            System.exit(1);
        }

        // 创建Recognition实例
        Recognition recognizer = new Recognition();
        // 创建RecognitionParam
        RecognitionParam param =
                RecognitionParam.builder()
                        // 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
                        // .apiKey("apikey")
                        .model("paraformer-realtime-v2")
                        .format("wav")
                        .sampleRate(16000)
                        // “language_hints”只支持paraformer-v2和paraformer-realtime-v2模型
                        .parameter("language_hints", new String[]{"zh", "en"})
                        .build();

        try {
            System.out.println("识别结果:" + recognizer.call(param, new File("asr_example.wav")));
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

Python

import requests
from http import HTTPStatus
from dashscope.audio.asr import Recognition

# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"

# 用户可忽略从url下载文件这部分代码,直接使用本地文件进行识别
r = requests.get(
    'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
)
with open('asr_example.wav', 'wb') as f:
    f.write(r.content)

recognition = Recognition(model='paraformer-realtime-v2',
                          format='wav',
                          sample_rate=16000,
                          # “language_hints”只支持paraformer-v2和paraformer-realtime-v2模型
                          language_hints=['zh', 'en'],
                          callback=None)
result = recognition.call('asr_example.wav')
if result.status_code == HTTPStatus.OK:
    print('识别结果:')
    print(result.get_sentence())
else:
    print('Error: ', result.message)

应用于生产环境

提升识别效果

  • 选择正确采样率的模型:8kHz 的电话音频应直接使用 8kHz 模型,而不是升采样到 16kHz 再识别,这样可以避免信息失真,获得更佳效果。

  • 使用热词功能:针对业务中的专有名词、人名、品牌名等,配置热词可以显著提升识别准确率,详情请参见定制热词-Paraformer/Fun-ASR

  • 优化输入音频质量:尽量使用高质量的麦克风,并确保录音环境信噪比高、无回声。在应用层面,可以集成降噪(如RNNoise)、回声消除(AEC)等算法对音频进行预处理,以获得更纯净的音频。

  • 明确指定识别语种:对于Paraformer-v2等支持多语种的模型,如果在调用时能预先确定音频的语种(如使用Language_hints参数指定语种为['zh','en']),可以帮助模型收敛,避免在相似发音的语种间混淆,提升准确性。

  • 语气词过滤:对于Paraformer模型,可以通过设置参数disfluency_removal_enabled开启语气词过滤功能,获得更书面、更易读的文本结果。

设置容错策略

  • 客户端重连:客户端应实现断线自动重连机制,以应对网络抖动。以Python SDK为例,您可以参考如下建议:

    1. 捕获异常:在Callback类中实现on_error方法。当dashscope SDK遇到网络错误或其他问题时,会调用该方法。

    2. 状态通知:当on_error被触发时,设置重连信号。在Python中可以使用threading.Event,它是一种线程安全的信号标志。

    3. 重连循环:将主逻辑包裹在一个for循环中(例如重试3次)。当检测到重连信号后,当前轮次的识别会中断,清理资源,然后等待几秒钟,再次进入循环,创建一个全新的连接。

  • 设置心跳防止连接断开:当需要与服务端保持长连接时,可将参数heartbeat设置为true,即使音频中长时间没有声音,与服务端的连接也不会中断。

  • 模型限流:在调用模型接口时请注意模型的限流规则。

API参考

模型功能特性对比

功能/特性

Fun-ASR

Paraformer

支持语言

因模型而异:

  • fun-asr-realtime、fun-asr-realtime-2025-11-07:中文(普通话、粤语、吴语、闽南语、客家话、赣语、湘语、晋语;并支持中原、西南、冀鲁、江淮、兰银、胶辽、东北、北京、港台等,包括河南、陕西、湖北、四川、重庆、云南、贵州、广东、广西、河北、天津、山东、安徽、南京、江苏、杭州、甘肃、宁夏等地区官话口音)、英文、日语

  • fun-asr-realtime-2025-09-15:中文(普通话)、英文

  • fun-asr-flash-8k-realtime、fun-asr-flash-8k-realtime-2026-01-28:中文

因模型而异:

  • paraformer-realtime-v2:中文(普通话、粤语、吴语、闽南语、东北话、甘肃话、贵州话、河南话、湖北话、湖南话、宁夏话、山西话、陕西话、山东话、四川话、天津话、江西话、云南话、上海话)、英文、日语、韩语、德语、法语、俄语

  • paraformer-realtime-v1、paraformer-realtime-8k-v2、paraformer-realtime-8k-v1:中文(普通话)

支持的音频格式

pcm、wav、mp3、opus、speex、aac、amr

采样率

因模型而异:

  • fun-asr-realtime、fun-asr-realtime-2025-11-07、fun-asr-realtime-2025-09-15:16kHz

  • fun-asr-flash-8k-realtime、fun-asr-flash-8k-realtime-2026-01-28:8kHz

因模型而异:

  • paraformer-realtime-v2:任意采样率

  • paraformer-realtime-v1:16kHz

  • paraformer-realtime-8k-v2、paraformer-realtime-8k-v1:8kHz

声道

单声道

输入形式

二进制音频流

音频大小/时长

不限

不限

情感识别

不支持

因模型而异:

  • paraformer-realtime-v2、paraformer-realtime-v1、paraformer-realtime-8k-v1:不支持

  • paraformer-realtime-8k-v2:支持 默认开启,可关闭

敏感词过滤

不支持

说话人分离

不支持

语气词过滤

支持 默认关闭,可开启

支持 默认关闭,可开启

时间戳

支持 固定开启

标点符号预测

支持 固定开启

因模型而异:

  • paraformer-realtime-v2、paraformer-realtime-8k-v2:支持 默认开启,可关闭

  • paraformer-realtime-v1、paraformer-realtime-8k-v1:支持 固定开启

热词

因模型而异:

  • fun-asr-flash-8k-realtime、fun-asr-flash-8k-realtime-2026-01-28:不支持

  • 其他模型:支持 可配置

ITN

支持 固定开启

VAD

支持 固定开启

限流(RPS)

20

20

接入方式

Java/Python/Android/iOS SDK、WebSocket API

价格

因模型而异:

  • fun-asr-realtime、fun-asr-realtime-2025-11-07:

    • 国际:$0.00009/秒

    • 中国内地:$0.000047/秒

  • fun-asr-realtime-2025-09-15:

    • 中国内地:$0.000047/秒

  • fun-asr-flash-8k-realtime、fun-asr-flash-8k-realtime-2026-01-28:

    • 中国内地:$0.000032/秒

中国内地:$0.000012/秒