Server-sent events for real-time speech recognition (Paraformer) - Alibaba Cloud Model Studio

Reference for the server-sent events that the Paraformer real-time speech recognition service pushes to clients over WebSocket. This topic documents the data structure and field semantics of the four event types: task-started, result-generated, task-finished, and task-failed.

User guide: For model details and selection guidance, see Speech-to-text.

Event interaction flow: For the event sequence diagram, see WebSocket API.

task-started

Description: The task has started successfully. The client can begin sending audio data.

header object

Properties

task_id string

Client-generated task ID (UUID format).

event string

Event type. Always task-started.

attributes object

Additional attributes. Typically empty.

{
    "header": {
        "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
        "event": "task-started",
        "attributes": {}
    },
    "payload": {}
}

payload object

Always {}.

result-generated

Description: Recognition result. Includes intermediate results (sentence_end=false) and final results (sentence_end=true).

header object

Properties

task_id string

Client-generated task ID (UUID format).

event string

Event type. Always result-generated.

{
  "header": {
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "event": "result-generated",
    "attributes": {}
  },
  "payload": {
    "output": {
      "sentence": {
        "begin_time": 170,
        "end_time": null,
        "text": "Okay, I got it.",
        "heartbeat": false,
        "sentence_end": true,
        "words": [
          {
            "begin_time": 170,
            "end_time": 295,
            "text": "Okay",
            "punctuation": ","
          },
          {
            "begin_time": 295,
            "end_time": 503,
            "text": "I",
            "punctuation": ""
          },
          {
            "begin_time": 503,
            "end_time": 711,
            "text": "got",
            "punctuation": ""
          },
          {
            "begin_time": 711,
            "end_time": 920,
            "text": "it",
            "punctuation": ""
          }
        ]
      }
    },
    "usage": {
      "duration": 3
    }
  }
}

payload object

Properties

output object

Properties

usage object

When payload.output.sentence.sentence_end is false (the current sentence has not ended), usage is null.

When payload.output.sentence.sentence_end is true (the current sentence has ended), usage.duration indicates the billable duration of the current task.

Properties

duration integer

Billable task duration, in seconds.

Properties

sentence object

Properties

begin_time integer

Sentence start time, in milliseconds.

end_time integer

Sentence end time, in milliseconds.

text string

Recognized text.

heartbeat boolean

If true, you can skip this result (heartbeat packet).

sentence_end boolean

Whether the sentence has ended (true=final result, false=intermediate result).

emo_tag string

Important

Only paraformer-realtime-8k-v2 supports this feature.
This feature requires semantic segmentation to be disabled. Set semantic_punctuation_enabled to false in the run-task event.
The emotion recognition result is returned only when payload.output.sentence.sentence_end is true.

Sentiment of the current sentence:

positive: Positive sentiment, such as happy or satisfied.
negative: Negative sentiment, such as angry or dejected.
neutral: No distinct sentiment.

emo_confidence float

Important

Only paraformer-realtime-8k-v2 supports this feature.
This feature requires semantic segmentation to be disabled. Set semantic_punctuation_enabled to false in the run-task event.
The emotion recognition result is returned only when payload.output.sentence.sentence_end is true.

Emotion confidence score in the range [0.0, 1.0]. A higher value indicates greater confidence.

words array[object]

Word-level timestamp information.

Properties

begin_time integer

Word start time, in milliseconds.

end_time integer

Word end time, in milliseconds.

text string

Recognized text.

punctuation string

Punctuation mark.

task-finished

Description: The task ended normally. You can close the connection or reuse it.

header object

Properties

task_id string

Client-generated task ID (UUID format).

event string

Event type. Always task-finished.

attributes object

Additional attributes. Typically empty.

{
    "header": {
        "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
        "event": "task-finished",
        "attributes": {}
    },
    "payload": {
        "output": {},
        "usage": null
    }
}

payload object

The contents can be ignored. Typically {}.

task-failed

Description: The task failed. The connection is closed and cannot be reused.

header object

Properties

task_id string

Client-generated task ID (UUID format).

event string

Event type. Always task-failed.

error_code string

Description of the error type.

error_message string

Specific cause of the error.

attributes object

Additional attributes. Typically empty.

{
    "header": {
        "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
        "event": "task-failed",
        "error_code": "CLIENT_ERROR",
        "error_message": "request timeout after 23 seconds.",
        "attributes": {}
    },
    "payload": {}
}

payload object

Always {}.