參數 | 類型 | 說明 | type | string | 固定為response.done。 | response | object | 響應對象。 | response.id | string | 響應的唯一 ID。 | response.object | string | 物件類型,此事件下固定為realtime.response。 | response.output | array | 響應的輸出。 | response.usage | object | 本次語音合成計費資訊。 | response.usage.characters | integer | Qwen3-TTS Realtime計費字元數。 | response.usage.total_tokens | integer | Qwen-TTS Realtime輸入和輸出(合成的音頻)內容總長度(Token)。 | response.usage.input_tokens | integer | Qwen-TTS Realtime輸入內容總長度(Token)。 | response.usage.output_tokens | integer | Qwen-TTS Realtime輸出內容總長度(Token)。 | response.usage.input_tokens_details | integer | Qwen-TTS Realtime輸入內容長度(Token)詳情。 | response.usage.input_tokens_details.text_tokens | integer | Qwen-TTS Realtime輸入常值內容總長度(Token)。 | response.usage.output_tokens_details | integer | Qwen-TTS Realtime輸出內容長度(Token)詳情。 | response.usage.output_tokens_details.text_tokens | integer | Qwen-TTS Realtime輸出常值內容總長度(Token)。 | response.usage.output_tokens_details.audio_tokens | integer | Qwen-TTS Realtime輸出音頻內容總長度(Token)。 音頻轉換為 Token 的規則:每1秒的音頻對應 50個 Token 。若音頻時間長度不足1秒,則按 50個 Token 計算。 |
| Qwen3-TTS Realtime{
"event_id": "event_Aemy83XqHFFDDSeJIDn6N",
"type": "response.done",
"response": {
"id": "resp_LFeR42yXZ9SxUAeXjmyTz",
"object": "realtime.response",
"conversation_id": "",
"status": "completed",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output": [
{
"id": "item_Ae1lv2XmRljRSG96L8Zm1",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"transcript": ""
}
]
}
],
"usage": {
"characters": 25
}
}
}
Qwen-TTS Realtime{
"event_id": "event_xxx",
"type": "response.done",
"response": {
"id": "resp_xxx",
"object": "realtime.response",
"conversation_id": "",
"status": "completed",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output": [
{
"id": "item_FIrYGaNVK3rbIZqeY4QjM",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"transcript": ""
}
]
}
],
"usage": {
"total_tokens": 67,
"input_tokens": 3,
"output_tokens": 64,
"input_tokens_details": {
"text_tokens": 3
},
"output_tokens_details": {
"text_tokens": 0,
"audio_tokens": 64
}
}
}
}
|