Custom hotwords for speech recognition - Alibaba Cloud Model Studio

Define a custom vocabulary list to improve the recognition accuracy of domain-specific terms, product names, and specialized vocabulary.

Important

The hotword feature is supported only in the primary workspace. Sub-workspaces don't support hotwords.

Overview

Some business-specific vocabulary — such as product names, proper nouns, and industry terms — isn't in the model's general lexicon and is often recognized incorrectly. Submit a vocabulary list to have the model prioritize these terms during decoding and improve recognition accuracy.

Hotword format

Submit hotwords as a JSON array. Each element defines a single hotword and its properties. A vocabulary list (referred to as vocabulary in the API) groups these hotwords for use with a specific model.

Example: Improve the recognition accuracy of movie titles.

[
    {"text": "Seediq Bale", "weight": 4, "lang": "en"},
    {"text": "Goodbye Mr. Loser", "weight": 4, "lang": "en"},
    {"text": "Confucius' Family", "weight": 4, "lang": "en"}
]

Field description:

Field	Type	Required	Description
text	string	Yes	The hotword text. Must be an actual word or phrase rather than an arbitrary character combination. The language must be supported by the selected model. For length limits, see Hotword text rules.
weight	int	Yes	Hotword weight. Valid values: [1, 5]. Recommended: 4. A higher weight makes the model more likely to output the term. For tuning guidelines, see Adjust hotword weights.
lang	string	No	Language code that restricts the hotword to a specific language. If the language is unknown, omit this field. Note: `language_hints` is a parameter of the speech recognition API, not the hotword API. It declares the language of the audio. When `language_hints` is set, only hotwords whose language matches are applied. Hotwords in other languages are ignored.

Prerequisites

Create an API key obtained and configured as an environment variable.
If calling through the DashScope SDK, install the latest SDK version.

Quick start

How it works

Create a vocabulary list first, then reference its ID when calling speech recognition:

Create a vocabulary list.
Call the Create Vocabulary List API. Specify target_model (targetModel in Java) to indicate which speech recognition model the list applies to.
Skip this step if a vocabulary list already exists. Use the Query All Vocabulary Lists API to check.
Call the speech recognition API with the vocabulary list ID.
The speech recognition model must match the target_model (targetModel in Java) set when the list was created; otherwise, hotwords don't take effect.

Sample code

End-to-end example: create a vocabulary list, call speech recognition, then delete the list. Sample audio file: asr_example.wav.

Note

The hotword management API and the speech recognition API must use the same account. Otherwise, the recognition API can't access the vocabulary list.

Python

import dashscope
from dashscope.audio.asr import *
import os

# The API Keys for the Beijing and Singapore regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

# Singapore region URL; for Beijing region, change to: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Singapore region WebSocket URL; for Beijing region, change to: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'
prefix = 'testpfx'
target_model = "fun-asr-realtime"

my_vocabulary = [
    {"text": "Speech Laboratory", "weight": 4}
]

service = VocabularyService()
vocabulary_id = service.create_vocabulary(
      prefix=prefix,
      target_model=target_model,
      vocabulary=my_vocabulary)

try:
    if service.query_vocabulary(vocabulary_id)['status'] == 'OK':
        recognition = Recognition(model=target_model,
                              format='wav',
                              sample_rate=16000,
                              callback=None,
                              vocabulary_id=vocabulary_id)
        result = recognition.call('asr_example.wav')
        print(result.output)
finally:
    # Delete the hotword list regardless of whether recognition succeeded, to avoid occupying quota
    service.delete_vocabulary(vocabulary_id)

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.vocabulary.Vocabulary;
import com.alibaba.dashscope.audio.asr.vocabulary.VocabularyService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

public class Main {
    // The API Keys for the Beijing and Singapore regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // If the environment variable is not configured, replace the following line with: public static String apiKey = "sk-xxx"
    public static String apiKey = System.getenv("DASHSCOPE_API_KEY");

    public static void main(String[] args) throws NoApiKeyException, InputRequiredException {
        // Singapore region URL; for Beijing region, change to: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        // Singapore region WebSocket URL; for Beijing region, change to: wss://dashscope.aliyuncs.com/api-ws/v1/inference
        Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";

        String targetModel = "fun-asr-realtime";

        JsonArray vocabularyJson = new JsonArray();
        List<Hotword> wordList = new ArrayList<>();
        wordList.add(new Hotword("Speech Laboratory", 4));

        for (Hotword word : wordList) {
            JsonObject jsonObject = new JsonObject();
            jsonObject.addProperty("text", word.text);
            jsonObject.addProperty("weight", word.weight);
            vocabularyJson.add(jsonObject);
        }

        VocabularyService service = new VocabularyService(apiKey);
        Vocabulary vocabulary = service.createVocabulary(targetModel, "testpfx", vocabularyJson);

        try {
            if ("OK".equals(service.queryVocabulary(vocabulary.getVocabularyId()).getStatus())) {
                Recognition recognizer = new Recognition();
                RecognitionParam param =
                        RecognitionParam.builder()
                                .model(targetModel)
                                .apiKey(apiKey)
                                .format("wav")
                                .sampleRate(16000)
                                .vocabularyId(vocabulary.getVocabularyId())
                                .build();

                try {
                    System.out.println("Recognition result: " + recognizer.call(param, new File("asr_example.wav")));
                } catch (Exception e) {
                    e.printStackTrace();
                } finally {
                    // Close the WebSocket connection
                    recognizer.getDuplexApi().close(1000, "bye");
                }
            }
        } finally {
            // Delete the hotword list regardless of whether recognition succeeded, to avoid occupying quota
            service.deleteVocabulary(vocabulary.getVocabularyId());
        }
        System.exit(0);
    }
}

class Hotword {
    String text;
    int weight;

    public Hotword(String text, int weight) {
        this.text = text;
        this.weight = weight;
    }
}

Advanced usage

Adjust hotword weights

Weight controls how strongly the model favors a hotword. Set it appropriately to improve target word accuracy without introducing false recognitions.

Weight	Effect	Best for
1-2	Slight preference	Hotwords that sound similar to common words, where overcorrection must be avoided
3-4	Clear preference (recommended)	The best starting point for most scenarios
5	Forced preference	Use only when the term appears frequently in the audio and is unlikely to be confused with other words. Note: An excessively high weight can cause phonetically similar words to be misrecognized as the hotword.

Start with weight=4 and adjust incrementally based on recognition results.

Hotword text rules

Hotword text must be an actual word or phrase. Length limits are as follows:

Contains non-ASCII characters: The total character count — including non-ASCII characters (such as Chinese, Japanese kana, Korean Hangul, and Cyrillic) and ASCII characters combined — must not exceed 15.
Examples:
- ✅ "厄洛替尼盐酸盐" (7 characters)
- ✅ "EGFR抑制剂" (7 characters, of which EGFR accounts for 4 ASCII characters)
- ✅ "こんにちは" (5 characters)
- ✅ "Фенибут Белфарм" (15 characters, including the space)
- ❌ "Клофелин Белмедпрепараты" (24 characters)
Pure ASCII characters: The number of space-delimited segments must not exceed 7.
Examples:
- ✅ "Exothermic reaction" = 2 segments
- ✅ "Human immunodeficiency virus type 1" = 5 segments
- ❌ "The effect of temperature variations on enzyme activity in biochemical reactions" = 11 segments

Design hotword lists

Group by scenario: Create separate vocabulary lists for different business scenarios (for example, one for medical terms and another for product names) to simplify maintenance and reuse.
Mix multiple languages: A single vocabulary list can contain terms in different languages. Use the lang field to distinguish them. When language_hints is specified during speech recognition, only hotwords that match the specified language take effect.
Clean up regularly: Delete unused vocabulary lists to free up quota. Each account supports up to 10 lists.

Limits and billing

Limit	Description
Number of vocabulary lists	10 per account, shared across all models. Submit a ticket to request a quota increase.
Hotwords per list	Up to 500 hotwords per vocabulary list.
Billing	Free of charge.

Supported scope

Available models vary by deployment scope:

International

Fun-ASR：

Real-time speech recognition：fun-asr-realtime、fun-asr-realtime-2025-11-07
Non-real-time speech recognition：fun-asr、fun-asr-2025-11-07、fun-asr-2025-08-25、fun-asr-mtl、fun-asr-mtl-2025-08-25

Chinese mainland

Fun-ASR：
- Real-time speech recognition：fun-asr-realtime、fun-asr-realtime-2025-11-07、fun-asr-realtime-2025-09-15、fun-asr-flash-8k-realtime、fun-asr-flash-8k-realtime-2026-01-28
- Non-real-time speech recognition：fun-asr、fun-asr-2025-11-07、fun-asr-2025-08-25、fun-asr-mtl、fun-asr-mtl-2025-08-25
Paraformer：
- Real-time speech recognition：paraformer-realtime-v2、paraformer-realtime-8k-v2
- Non-real-time speech recognition：paraformer-v2、paraformer-8k-v2

API reference

Custom Hotword API Reference

FAQ

Q: Why don't hotwords improve recognition accuracy?

Check the following in order:

Model mismatch: The target_model specified when creating the list must match the model used by the speech recognition API. A mismatch doesn't cause an error, and recognition still returns results, but the hotwords don't take effect. If the results don't contain expected hotwords, check this first.
Unsupported model: The model must belong to the Fun-ASR or Paraformer family. Other families don't support hotwords. Calling the API with an unsupported model doesn't return an error, but the results may be empty or lack hotword enhancement. If using a model such as SenseVoice, check this first.
Inappropriate weight: Increase the weight from 4 to 5 and observe the results. If phonetically similar words start being misrecognized as the hotword, reduce it back to 4.
Hotword list status: Use the Query API to confirm that status is OK.

Q: Are hotwords used differently in real-time and file-based recognition?

Hotword lists are created the same way. The calling method differs:

Real-time speech recognition: Pass vocabulary_id in the Recognition or WebSocket connection parameters.
File-based speech recognition: Pass vocabulary_id in the Transcription request parameters.

In both cases, target_model must match the speech recognition model used in the API call.

Q: How to improve recognition accuracy beyond hotwords?

In addition to hotwords, consider the following:

Audio quality: Match the sample rate to the model requirements (16 kHz or 8 kHz) and reduce background noise.
Choose the right model: Different scenarios call for different models. For details, see the Speech-to-text model selection guide.
Specify the language: Declare the audio language through language_hints to improve accuracy in single-language scenarios.