Define a custom vocabulary list to improve the recognition accuracy of domain-specific terms, product names, and specialized vocabulary.
The hotword feature is supported only in the primary workspace. Sub-workspaces don't support hotwords.
Overview
Some business-specific vocabulary — such as product names, proper nouns, and industry terms — isn't in the model's general lexicon and is often recognized incorrectly. Submit a vocabulary list to have the model prioritize these terms during decoding and improve recognition accuracy.
Hotword format
Submit hotwords as a JSON array. Each element defines a single hotword and its properties. A vocabulary list (referred to as vocabulary in the API) groups these hotwords for use with a specific model.
Example: Improve the recognition accuracy of movie titles.
[
{"text": "Seediq Bale", "weight": 4, "lang": "en"},
{"text": "Goodbye Mr. Loser", "weight": 4, "lang": "en"},
{"text": "Confucius' Family", "weight": 4, "lang": "en"}
]Field description:
Field | Type | Required | Description |
text | string | Yes | The hotword text. Must be an actual word or phrase rather than an arbitrary character combination. The language must be supported by the selected model. For length limits, see Hotword text rules. |
weight | int | Yes | Hotword weight. Valid values: [1, 5]. Recommended: 4. A higher weight makes the model more likely to output the term. For tuning guidelines, see Adjust hotword weights. |
lang | string | No | Language code that restricts the hotword to a specific language. If the language is unknown, omit this field. Note: |
Prerequisites
Create an API key obtained and configured as an environment variable.
If calling through the DashScope SDK, install the latest SDK version.
Quick start
How it works
Create a vocabulary list first, then reference its ID when calling speech recognition:
Create a vocabulary list.
Call the Create Vocabulary List API. Specify target_model (targetModel in Java) to indicate which speech recognition model the list applies to.
Skip this step if a vocabulary list already exists. Use the Query All Vocabulary Lists API to check.
Call the speech recognition API with the vocabulary list ID.
The speech recognition model must match the target_model (targetModel in Java) set when the list was created; otherwise, hotwords don't take effect.
Sample code
End-to-end example: create a vocabulary list, call speech recognition, then delete the list. Sample audio file: asr_example.wav.
The hotword management API and the speech recognition API must use the same account. Otherwise, the recognition API can't access the vocabulary list.
Python
import dashscope
from dashscope.audio.asr import *
import os
# The API Keys for the Beijing and Singapore regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')
# Singapore region URL; for Beijing region, change to: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Singapore region WebSocket URL; for Beijing region, change to: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'
prefix = 'testpfx'
target_model = "fun-asr-realtime"
my_vocabulary = [
{"text": "Speech Laboratory", "weight": 4}
]
service = VocabularyService()
vocabulary_id = service.create_vocabulary(
prefix=prefix,
target_model=target_model,
vocabulary=my_vocabulary)
try:
if service.query_vocabulary(vocabulary_id)['status'] == 'OK':
recognition = Recognition(model=target_model,
format='wav',
sample_rate=16000,
callback=None,
vocabulary_id=vocabulary_id)
result = recognition.call('asr_example.wav')
print(result.output)
finally:
# Delete the hotword list regardless of whether recognition succeeded, to avoid occupying quota
service.delete_vocabulary(vocabulary_id)Java
import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.audio.asr.vocabulary.Vocabulary;
import com.alibaba.dashscope.audio.asr.vocabulary.VocabularyService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
public class Main {
// The API Keys for the Beijing and Singapore regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If the environment variable is not configured, replace the following line with: public static String apiKey = "sk-xxx"
public static String apiKey = System.getenv("DASHSCOPE_API_KEY");
public static void main(String[] args) throws NoApiKeyException, InputRequiredException {
// Singapore region URL; for Beijing region, change to: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
// Singapore region WebSocket URL; for Beijing region, change to: wss://dashscope.aliyuncs.com/api-ws/v1/inference
Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
String targetModel = "fun-asr-realtime";
JsonArray vocabularyJson = new JsonArray();
List<Hotword> wordList = new ArrayList<>();
wordList.add(new Hotword("Speech Laboratory", 4));
for (Hotword word : wordList) {
JsonObject jsonObject = new JsonObject();
jsonObject.addProperty("text", word.text);
jsonObject.addProperty("weight", word.weight);
vocabularyJson.add(jsonObject);
}
VocabularyService service = new VocabularyService(apiKey);
Vocabulary vocabulary = service.createVocabulary(targetModel, "testpfx", vocabularyJson);
try {
if ("OK".equals(service.queryVocabulary(vocabulary.getVocabularyId()).getStatus())) {
Recognition recognizer = new Recognition();
RecognitionParam param =
RecognitionParam.builder()
.model(targetModel)
.apiKey(apiKey)
.format("wav")
.sampleRate(16000)
.vocabularyId(vocabulary.getVocabularyId())
.build();
try {
System.out.println("Recognition result: " + recognizer.call(param, new File("asr_example.wav")));
} catch (Exception e) {
e.printStackTrace();
} finally {
// Close the WebSocket connection
recognizer.getDuplexApi().close(1000, "bye");
}
}
} finally {
// Delete the hotword list regardless of whether recognition succeeded, to avoid occupying quota
service.deleteVocabulary(vocabulary.getVocabularyId());
}
System.exit(0);
}
}
class Hotword {
String text;
int weight;
public Hotword(String text, int weight) {
this.text = text;
this.weight = weight;
}
}Advanced usage
Adjust hotword weights
Weight controls how strongly the model favors a hotword. Set it appropriately to improve target word accuracy without introducing false recognitions.
Weight | Effect | Best for |
1-2 | Slight preference | Hotwords that sound similar to common words, where overcorrection must be avoided |
3-4 | Clear preference (recommended) | The best starting point for most scenarios |
5 | Forced preference | Use only when the term appears frequently in the audio and is unlikely to be confused with other words. Note: An excessively high weight can cause phonetically similar words to be misrecognized as the hotword. |
Start with weight=4 and adjust incrementally based on recognition results.
Hotword text rules
Hotword text must be an actual word or phrase. Length limits are as follows:
Contains non-ASCII characters: The total character count — including non-ASCII characters (such as Chinese, Japanese kana, Korean Hangul, and Cyrillic) and ASCII characters combined — must not exceed 15.
Examples:
✅
"厄洛替尼盐酸盐"(7 characters)✅
"EGFR抑制剂"(7 characters, of which EGFR accounts for 4 ASCII characters)✅
"こんにちは"(5 characters)✅
"Фенибут Белфарм"(15 characters, including the space)❌
"Клофелин Белмедпрепараты"(24 characters)
Pure ASCII characters: The number of space-delimited segments must not exceed 7.
Examples:
✅
"Exothermic reaction"= 2 segments✅
"Human immunodeficiency virus type 1"= 5 segments❌
"The effect of temperature variations on enzyme activity in biochemical reactions"= 11 segments
Design hotword lists
Group by scenario: Create separate vocabulary lists for different business scenarios (for example, one for medical terms and another for product names) to simplify maintenance and reuse.
Mix multiple languages: A single vocabulary list can contain terms in different languages. Use the
langfield to distinguish them. Whenlanguage_hintsis specified during speech recognition, only hotwords that match the specified language take effect.Clean up regularly: Delete unused vocabulary lists to free up quota. Each account supports up to 10 lists.
Limits and billing
Limit | Description |
Number of vocabulary lists | 10 per account, shared across all models. Submit a ticket to request a quota increase. |
Hotwords per list | Up to 500 hotwords per vocabulary list. |
Billing | Free of charge. |
Supported scope
Available models vary by deployment scope:
International
Fun-ASR:
Real-time speech recognition:fun-asr-realtime、fun-asr-realtime-2025-11-07
Non-real-time speech recognition:fun-asr、fun-asr-2025-11-07、fun-asr-2025-08-25、fun-asr-mtl、fun-asr-mtl-2025-08-25
Chinese mainland
Fun-ASR:
Real-time speech recognition:fun-asr-realtime、fun-asr-realtime-2025-11-07、fun-asr-realtime-2025-09-15、fun-asr-flash-8k-realtime、fun-asr-flash-8k-realtime-2026-01-28
Non-real-time speech recognition:fun-asr、fun-asr-2025-11-07、fun-asr-2025-08-25、fun-asr-mtl、fun-asr-mtl-2025-08-25
Paraformer:
Real-time speech recognition:paraformer-realtime-v2、paraformer-realtime-8k-v2
Non-real-time speech recognition:paraformer-v2、paraformer-8k-v2
API reference
FAQ
Q: Why don't hotwords improve recognition accuracy?
Check the following in order:
Model mismatch: The
target_modelspecified when creating the list must match the model used by the speech recognition API. A mismatch doesn't cause an error, and recognition still returns results, but the hotwords don't take effect. If the results don't contain expected hotwords, check this first.Unsupported model: The model must belong to the Fun-ASR or Paraformer family. Other families don't support hotwords. Calling the API with an unsupported model doesn't return an error, but the results may be empty or lack hotword enhancement. If using a model such as SenseVoice, check this first.
Inappropriate weight: Increase the weight from 4 to 5 and observe the results. If phonetically similar words start being misrecognized as the hotword, reduce it back to 4.
Hotword list status: Use the Query API to confirm that
statusisOK.
Q: Are hotwords used differently in real-time and file-based recognition?
Hotword lists are created the same way. The calling method differs:
Real-time speech recognition: Pass
vocabulary_idin the Recognition or WebSocket connection parameters.File-based speech recognition: Pass
vocabulary_idin the Transcription request parameters.
In both cases, target_model must match the speech recognition model used in the API call.
Q: How to improve recognition accuracy beyond hotwords?
In addition to hotwords, consider the following:
Audio quality: Match the sample rate to the model requirements (16 kHz or 8 kHz) and reduce background noise.
Choose the right model: Different scenarios call for different models. For details, see the Speech-to-text model selection guide.
Specify the language: Declare the audio language through
language_hintsto improve accuracy in single-language scenarios.