Two WebSocket client events control a Paraformer real-time speech recognition task: run-task starts the task with the model and audio settings, and finish-task ends the task after the audio stream completes. This page describes the message structure and field semantics of both events.
User guide: For model details and selection guidance, see Speech-to-text.
Event flow: For the event interaction sequence, see WebSocket API.
run-task
Description: Starts a speech recognition task and configures parameters such as the model, audio format, and sample rate.
When to send: Immediately after the WebSocket connection is established.
Response event: The server must return the task-started event before audio data can be sent.
|
header object (Required)
Properties
action string (Required)
Instruction type. Set to run-task.
task_id string (Required)
Client-generated task ID in UUID format. Used to correlate subsequent events with this task.
streaming string (Required)
|
{
"header": {
"action": "run-task",
"task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
"streaming": "duplex"
},
"payload": {
"task_group": "audio",
"task": "asr",
"function": "recognition",
"model": "paraformer-realtime-v2",
"parameters": {
"format": "pcm",
"sample_rate": 16000,
"disfluency_removal_enabled": false,
"language_hints": [
"en"
]
},
"input": {}
}
}
|
|
payload object (Required)
Properties
task_group string (Required)
Task group. Set to audio.
function string (Required)
Function type. Set to recognition.
parameters object (Required)
Speech recognition parameters.
Properties
format string (Required)
Audio format.
Valid values:
-
pcm
-
wav
-
mp3
-
opus
-
speex
-
aac
-
amr
Important
Paraformer enforces the following constraints:
-
opus and speex: Must use Ogg encapsulation.
-
wav: Must use PCM encoding.
-
amr: Only AMR-NB is supported.
sample_rate integer (Required)
Sample rate, in Hz.
Valid values:
vocabulary_id string (Optional)
disfluency_removal_enabled boolean (Optional)
Important
Only Paraformer supports this parameter.
Whether to filter out filler words.
Default: false.
language_hints array[string] (Optional)
Language of the audio to recognize. No default value. If not set, the model detects the language automatically.
Valid values:
-
Paraformer:
-
zh: Chinese
-
en: English
-
ja: Japanese
-
yue: Cantonese
-
ko: Korean
-
de: German
-
fr: French
-
ru: Russian
semantic_punctuation_enabled boolean (Optional)
Important
Only Paraformer v2 supports this parameter.
Whether to enable semantic-based sentence segmentation.
Default: false.
Semantic-based segmentation is more accurate and suits meeting transcription. VAD-based (Voice Activity Detection) segmentation has lower latency and suits interactive scenarios.
max_sentence_silence integer (Optional)
Silence threshold for VAD-based sentence segmentation, in milliseconds. The system ends the current sentence when silence after a speech segment exceeds this threshold.
Default: 1300.
Valid range: [200, 6000].
multi_threshold_mode_enabled boolean (Optional)
Whether to enable multi-threshold mode. When enabled, this mode prevents VAD-based segmentation from producing overly long segments.
Default: false.
punctuation_prediction_enabled boolean (Optional)
Important
Only Paraformer v2 supports this parameter.
Whether to add punctuation to the recognition results.
Default: true.
heartbeat boolean (Optional)
Important
Only Paraformer v2 supports this parameter.
Whether to enable heartbeat packets.
Default: false.
inverse_text_normalization_enabled boolean (Optional)
Important
Only Paraformer v2 supports this parameter.
Whether to enable Inverse Text Normalization (ITN). When enabled, Chinese numerals are converted to Arabic numerals.
Default: true.
|
finish-task
Description: Notifies the server that all audio data has been sent and requests that the task be ended.
When to send: After all audio data has been sent.
Response event: The server returns the task-finished event.
|
header object (Required)
Properties
action string (Required)
Instruction type. Set to finish-task.
task_id string (Required)
Client-generated task ID in UUID format. Must match the task_id used in the run-task event.
streaming string (Required)
|
{
"header": {
"action": "finish-task",
"task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
"streaming": "duplex"
},
"payload": {
"input": {}
}
}
|
|
payload object (Required)
|