录音文件识别 - 大模型服务平台百炼

Fun-ASR/Paraformer的录音文件识别模型能将录制好的音频转换为文本，支持单个文件识别和批量文件识别，适用于处理不需要即时返回结果的场景。

核心功能

多语种识别：支持识别中文（含多种方言）、英、日、韩、德、法、俄等多种语言。
广泛格式兼容：支持任意采样率，并兼容aac、wav、mp3等多种主流音视频格式。
长音频文件处理：支持对单个时长不超过12小时、体积不超过2GB的音频文件进行异步转写。
歌唱识别：即使在伴随背景音乐（BGM）的情况下，也能实现整首歌曲的转写（仅fun-asr和fun-asr-2025-11-07模型支持该功能）。
丰富识别功能：提供说话人分离、敏感词过滤、句子/词语级时间戳、热词增强等可配置功能，满足个性化需求。

适用范围

支持的模型：

国际

在国际部署模式下，接入点与数据存储均位于新加坡地域，模型推理计算资源在全球范围内动态调度（不含中国内地）。

调用以下模型时，请选择新加坡地域的API Key：

Fun-ASR：fun-asr（稳定版，当前等同fun-asr-2025-11-07）、fun-asr-2025-11-07（快照版）、fun-asr-2025-08-25（快照版）、fun-asr-mtl（稳定版，当前等同fun-asr-mtl-2025-08-25）、fun-asr-mtl-2025-08-25（快照版）

中国内地

在中国内地部署模式下，接入点与数据存储均位于北京地域，模型推理计算资源仅限于中国内地。

调用以下模型时，请选择北京地域的API Key：

Fun-ASR：fun-asr（稳定版，当前等同fun-asr-2025-11-07）、fun-asr-2025-11-07（快照版）、fun-asr-2025-08-25（快照版）、fun-asr-mtl（稳定版，当前等同fun-asr-mtl-2025-08-25）、fun-asr-mtl-2025-08-25（快照版）
Paraformer：paraformer-v2、paraformer-8k-v2

更多信息请参见模型列表

模型选型

场景	推荐模型	理由
中文识别（会议/直播）	fun-asr	针对中文深度优化，覆盖多种方言；远场VAD和噪声鲁棒性强，适合嘈杂或多人远距离发言的真实场景，准确率更高
多语种识别（国际会议）	fun-asr-mtl、paraformer-v2	一个模型即可应对多语言需求，简化开发和部署
文娱内容分析与字幕生成	fun-asr	具备独特的歌唱识别能力，能有效转写歌曲、直播中的演唱片段；结合其噪声鲁棒性，非常适合处理复杂的媒体音频
新闻/访谈节目字幕生成	fun-asr、paraformer-v2	长音频+标点预测+时间戳，直接生成结构化字幕
智能硬件远场语音交互	fun-asr	远场VAD（语音活动检测）经过专门优化，能在家庭、车载等嘈杂环境下，更准确地捕捉和识别用户的远距离指令

更多说明请参见模型功能特性对比

快速开始

下面是调用API的示例代码。

您需要已获取API Key并配置API Key到环境变量。如果通过SDK调用，还需要安装DashScope SDK。

Fun-ASR

由于音视频文件的尺寸通常较大，文件传输和语音识别处理均需要时间，文件转写API通过异步调用方式来提交任务。开发者需要通过查询接口，在文件转写完成后获得语音识别结果。

Python

from http import HTTPStatus
from dashscope.audio.asr import Transcription
from urllib import request
import dashscope
import os
import json

# 以下为新加坡地域url，若使用北京地域的模型，需将url替换为：https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# 新加坡地域和北京地域的API Key不同。获取API Key：https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 若没有配置环境变量，请用百炼API Key将下行替换为：dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

task_response = Transcription.async_call(
    model='fun-asr',
    file_urls=['https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav',
               'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav'],
    language_hints=['zh', 'en']  # language_hints为可选参数，用于指定待识别音频的语言代码。取值范围请参见API参考文档。
)

transcription_response = Transcription.wait(task=task_response.output.task_id)

if transcription_response.status_code == HTTPStatus.OK:
    for transcription in transcription_response.output['results']:
        if transcription['subtask_status'] == 'SUCCEEDED':
            url = transcription['transcription_url']
            result = json.loads(request.urlopen(url).read().decode('utf8'))
            print(json.dumps(result, indent=4,
                            ensure_ascii=False))
        else:
            print('transcription failed!')
            print(transcription)
else:
    print('Error: ', transcription_response.output.message)

Java

import com.alibaba.dashscope.audio.asr.transcription.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.*;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.List;

public class Main {
    public static void main(String[] args) {
        // 以下为新加坡地域url，若使用北京地域的模型，需将url替换为：https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        // 创建转写请求参数。
        TranscriptionParam param =
                TranscriptionParam.builder()
                        // 新加坡和北京地域的API Key不同。获取API Key：https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                        // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model("fun-asr")
                        // language_hints为可选参数，用于指定待识别音频的语言代码。取值范围请参见API参考文档。
                        .parameter("language_hints", new String[]{"zh", "en"})
                        .fileUrls(
                                Arrays.asList(
                                        "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                                        "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
                        .build();
        try {
            Transcription transcription = new Transcription();
            // 提交转写请求
            TranscriptionResult result = transcription.asyncCall(param);
            System.out.println("RequestId: " + result.getRequestId());
            // 阻塞等待任务完成并获取结果
            result = transcription.wait(
                    TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            // 获取转写结果
            List<TranscriptionTaskResult> taskResultList = result.getResults();
            if (taskResultList != null && taskResultList.size() > 0) {
                for (TranscriptionTaskResult taskResult : taskResultList) {
                    String transcriptionUrl = taskResult.getTranscriptionUrl();
                    HttpURLConnection connection =
                            (HttpURLConnection) new URL(transcriptionUrl).openConnection();
                    connection.setRequestMethod("GET");
                    connection.connect();
                    BufferedReader reader =
                            new BufferedReader(new InputStreamReader(connection.getInputStream()));
                    Gson gson = new GsonBuilder().setPrettyPrinting().create();
                    JsonElement jsonResult = gson.fromJson(reader, JsonObject.class);
                    System.out.println(gson.toJson(jsonResult));
                }
            }
        } catch (Exception e) {
            System.out.println("error: " + e);
        }
        System.exit(0);
    }
}

完整的识别结果会以JSON格式打印在控制台。完整结果包含转换后的文本以及文本在音视频文件中的起始、结束时间（以毫秒为单位）。

第一个结果

{
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "properties": {
        "audio_format": "pcm_s16le",
        "channels": [
            0
        ],
        "original_sampling_rate": 16000,
        "original_duration_in_milliseconds": 3834
    },
    "transcripts": [
        {
            "channel_id": 0,
            "content_duration_in_milliseconds": 2480,
            "text": "Hello World，这里是阿里巴巴语音实验室。",
            "sentences": [
                {
                    "begin_time": 760,
                    "end_time": 3240,
                    "text": "Hello World，这里是阿里巴巴语音实验室。",
                    "sentence_id": 1,
                    "words": [
                        {
                            "begin_time": 760,
                            "end_time": 1000,
                            "text": "Hello",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1000,
                            "end_time": 1120,
                            "text": " World",
                            "punctuation": "，"
                        },
                        {
                            "begin_time": 1400,
                            "end_time": 1920,
                            "text": "这里是",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1920,
                            "end_time": 2520,
                            "text": "阿里巴巴",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2520,
                            "end_time": 2840,
                            "text": "语音",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2840,
                            "end_time": 3240,
                            "text": "实验室",
                            "punctuation": "。"
                        }
                    ]
                }
            ]
        }
    ]
}

第二个结果

{
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
    "properties": {
        "audio_format": "pcm_s16le",
        "channels": [
            0
        ],
        "original_sampling_rate": 16000,
        "original_duration_in_milliseconds": 4726
    },
    "transcripts": [
        {
            "channel_id": 0,
            "content_duration_in_milliseconds": 3800,
            "text": "Hello World，这里是阿里巴巴语音实验室。",
            "sentences": [
                {
                    "begin_time": 680,
                    "end_time": 4480,
                    "text": "Hello World，这里是阿里巴巴语音实验室。",
                    "sentence_id": 1,
                    "words": [
                        {
                            "begin_time": 680,
                            "end_time": 960,
                            "text": "Hello",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 960,
                            "end_time": 1080,
                            "text": " World",
                            "punctuation": "，"
                        },
                        {
                            "begin_time": 1480,
                            "end_time": 2160,
                            "text": "这里是",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2160,
                            "end_time": 3080,
                            "text": "阿里巴巴",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3080,
                            "end_time": 3520,
                            "text": "语音",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3520,
                            "end_time": 4480,
                            "text": "实验室",
                            "punctuation": "。"
                        }
                    ]
                }
            ]
        }
    ]
}

Paraformer

Python

from http import HTTPStatus
from dashscope.audio.asr import Transcription
from urllib import request
import dashscope
import os
import json


# 获取API Key：https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 若没有配置环境变量，请用百炼API Key将下行替换为：dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

task_response = Transcription.async_call(
    model='paraformer-v2',
    file_urls=['https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav',
               'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav'],
    language_hints=['zh', 'en']  # language_hints为可选参数，用于指定待识别音频的语言代码。仅Paraformer系列的paraformer-v2模型支持该参数，取值范围请参见API参考文档。
)

transcription_response = Transcription.wait(task=task_response.output.task_id)

if transcription_response.status_code == HTTPStatus.OK:
    for transcription in transcription_response.output['results']:
        if transcription['subtask_status'] == 'SUCCEEDED':
            url = transcription['transcription_url']
            result = json.loads(request.urlopen(url).read().decode('utf8'))
            print(json.dumps(result, indent=4,
                            ensure_ascii=False))
        else:
            print('transcription failed!')
            print(transcription)
else:
    print('Error: ', transcription_response.output.message)

Java

import com.alibaba.dashscope.audio.asr.transcription.*;
import com.google.gson.*;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.List;

public class Main {
    public static void main(String[] args) {
        // 创建转写请求参数
        TranscriptionParam param =
                TranscriptionParam.builder()
                        // 获取API Key：https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                        // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model("paraformer-v2")
                        // language_hints为可选参数，用于指定待识别音频的语言代码。仅Paraformer系列的paraformer-v2模型支持该参数，取值范围请参见API参考文档。
                        .parameter("language_hints", new String[]{"zh", "en"})
                        .fileUrls(
                                Arrays.asList(
                                        "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                                        "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
                        .build();
        try {
            Transcription transcription = new Transcription();
            // 提交转写请求
            TranscriptionResult result = transcription.asyncCall(param);
            System.out.println("RequestId: " + result.getRequestId());
            // 阻塞等待任务完成并获取结果
            result = transcription.wait(
                    TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            // 获取转写结果
            List<TranscriptionTaskResult> taskResultList = result.getResults();
            if (taskResultList != null && taskResultList.size() > 0) {
                for (TranscriptionTaskResult taskResult : taskResultList) {
                    String transcriptionUrl = taskResult.getTranscriptionUrl();
                    HttpURLConnection connection =
                            (HttpURLConnection) new URL(transcriptionUrl).openConnection();
                    connection.setRequestMethod("GET");
                    connection.connect();
                    BufferedReader reader =
                            new BufferedReader(new InputStreamReader(connection.getInputStream()));
                    Gson gson = new GsonBuilder().setPrettyPrinting().create();
                    JsonElement jsonResult = gson.fromJson(reader, JsonObject.class);
                    System.out.println(gson.toJson(jsonResult));
                }
            }
        } catch (Exception e) {
            System.out.println("error: " + e);
        }
        System.exit(0);
    }
}

完整的识别结果会以JSON格式打印在控制台。完整结果包含转换后的文本以及文本在音视频文件中的起始、结束时间（以毫秒为单位）。

第一个结果

{
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
    "properties": {
        "audio_format": "pcm_s16le",
        "channels": [
            0
        ],
        "original_sampling_rate": 16000,
        "original_duration_in_milliseconds": 4726
    },
    "transcripts": [
        {
            "channel_id": 0,
            "content_duration_in_milliseconds": 4720,
            "text": "Hello world, 这里是阿里巴巴语音实验室。",
            "sentences": [
                {
                    "begin_time": 0,
                    "end_time": 4720,
                    "text": "Hello world, 这里是阿里巴巴语音实验室。",
                    "sentence_id": 1,
                    "words": [
                        {
                            "begin_time": 0,
                            "end_time": 629,
                            "text": "Hello ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 629,
                            "end_time": 944,
                            "text": "world",
                            "punctuation": ", "
                        },
                        {
                            "begin_time": 944,
                            "end_time": 1258,
                            "text": "这",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1258,
                            "end_time": 1573,
                            "text": "里",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1573,
                            "end_time": 1888,
                            "text": "是",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1888,
                            "end_time": 2202,
                            "text": "阿",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2202,
                            "end_time": 2517,
                            "text": "里",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2517,
                            "end_time": 2832,
                            "text": "巴",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2832,
                            "end_time": 3146,
                            "text": "巴",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3146,
                            "end_time": 3461,
                            "text": "语",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3461,
                            "end_time": 3776,
                            "text": "音",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3776,
                            "end_time": 4090,
                            "text": "实",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 4090,
                            "end_time": 4405,
                            "text": "验",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 4405,
                            "end_time": 4720,
                            "text": "室",
                            "punctuation": "。"
                        }
                    ]
                }
            ]
        }
    ]
}

第二个结果

{
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "properties": {
        "audio_format": "pcm_s16le",
        "channels": [
            0
        ],
        "original_sampling_rate": 16000,
        "original_duration_in_milliseconds": 3834
    },
    "transcripts": [
        {
            "channel_id": 0,
            "content_duration_in_milliseconds": 3720,
            "text": "Hello word, 这里是阿里巴巴语音实验室。",
            "sentences": [
                {
                    "begin_time": 100,
                    "end_time": 3820,
                    "text": "Hello word, 这里是阿里巴巴语音实验室。",
                    "sentence_id": 1,
                    "words": [
                        {
                            "begin_time": 100,
                            "end_time": 596,
                            "text": "Hello ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 596,
                            "end_time": 844,
                            "text": "word",
                            "punctuation": ", "
                        },
                        {
                            "begin_time": 844,
                            "end_time": 1092,
                            "text": "这",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1092,
                            "end_time": 1340,
                            "text": "里",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1340,
                            "end_time": 1588,
                            "text": "是",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1588,
                            "end_time": 1836,
                            "text": "阿",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1836,
                            "end_time": 2084,
                            "text": "里",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2084,
                            "end_time": 2332,
                            "text": "巴",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2332,
                            "end_time": 2580,
                            "text": "巴",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2580,
                            "end_time": 2828,
                            "text": "语",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2828,
                            "end_time": 3076,
                            "text": "音",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3076,
                            "end_time": 3324,
                            "text": "实",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3324,
                            "end_time": 3572,
                            "text": "验",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3572,
                            "end_time": 3820,
                            "text": "室",
                            "punctuation": "。"
                        }
                    ]
                }
            ]
        }
    ]
}

API参考

模型功能特性对比

功能/特性	Fun-ASR	Paraformer
支持语言	因模型而异： fun-asr、fun-asr-2025-11-07：中文（普通话、粤语、吴语、闽南语、客家话、赣语、湘语、晋语；并支持中原、西南、冀鲁、江淮、兰银、胶辽、东北、北京、港台等，包括河南、陕西、湖北、四川、重庆、云南、贵州、广东、广西、河北、天津、山东、安徽、南京、江苏、杭州、甘肃、宁夏等地区官话口音）、英文、日语 fun-asr-2025-08-25：中文（普通话）、英文 fun-asr-mtl、fun-asr-mtl-2025-08-25：中文（普通话、粤语）、英文、日语、韩语、越南语、印尼语、泰语、马来语、菲律宾语、阿拉伯语、印地语、保加利亚语、克罗地亚语、捷克语、丹麦语、荷兰语、爱沙尼亚语、芬兰语、希腊语、匈牙利语、爱尔兰语、拉脱维亚语、立陶宛语、马耳他语、波兰语、葡萄牙语、罗马尼亚语、斯洛伐克语、斯洛文尼亚语、瑞典语	因模型而异： paraformer-v2：中文（普通话、粤语、吴语、闽南语、东北话、甘肃话、贵州话、河南话、湖北话、湖南话、宁夏话、山西话、陕西话、山东话、四川话、天津话、江西话、云南话、上海话）、英文、日语、韩语、德语、法语、俄语 paraformer-8k-v2：中文普通话
支持的音频格式	aac、amr、avi、flac、flv、m4a、mkv、mov、mp3、mp4、mpeg、ogg、opus、wav、webm、wma、wmv	aac、amr、avi、flac、flv、m4a、mkv、mov、mp3、mp4、mpeg、ogg、opus、wav、webm、wma、wmv
采样率	任意	因模型而异： paraformer-v2：任意 paraformer-8k-v2：8kHz
声道	任意
输入形式	公网可访问的待识别文件URL，最多支持输入100个音频
音频大小/时长	每个音频文件大小不超过2GB，且时长不超过12小时
情感识别	不支持
时间戳	支持固定开启	支持默认关闭，可开启
标点符号预测	支持固定开启
热词	支持可配置
ITN	支持固定开启
歌唱识别	支持仅fun-asr和fun-asr-2025-11-07支持该功能	不支持
噪声拒识	支持固定开启
敏感词过滤	支持默认过滤阿里云百炼敏感词表中的内容，更多内容过滤需自定义
说话人分离	支持默认关闭，可开启
语气词过滤	不支持	支持默认关闭，可开启
VAD	支持固定开启
限流（RPS）	提交作业接口：10 任务查询接口：20	提交作业接口：20 任务查询接口：20
接入方式	DashScope：Java/Python SDK、RESTful API
价格	国际：$0.000035/秒中国内地：$0.000032/秒	中国内地：$0.000012/秒

常见问题

Q：如何提升识别准确率？

需综合考虑影响因素并采取相应措施。

主要影响因素：

声音质量：录音设备、采样率及环境噪声影响清晰度（高质量音频是基础）
说话人特征：音调、语速、口音和方言差异（尤其少见方言或重口音）增加识别难度
语言和词汇：多语言混合、专业术语或俚语提升识别难度（热词配置可优化）
上下文理解：缺乏上下文易导致语义歧义（尤其在依赖前后文才能正确识别的语境中）

优化方法：

优化音频质量：使用高性能麦克风及推荐采样率设备；减少环境噪声与回声
适配说话人：针对显著口音/方言场景，选用支持方言的模型
配置热词：为专业术语、专有名词等设置热词（参见定制热词）
保留上下文：避免过短音频分段