サーバーサイドイベント - Alibaba Cloud Model Studio - Alibaba Cloud ドキュメントセンター

このトピックでは、qwen3-livetranslate-flash-realtime API のサーバーサイドイベントについて説明します。

リファレンス：リアルタイム音声・動画翻訳 - Qwen

error

サーバーが返すエラーメッセージです。

event_id string

このイベントの一意の識別子です。

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
    "param": "session.modalities"
  }
}

type string

イベントタイプです。値は常に error です。

error object

エラーの詳細情報です。

プロパティ

type string

エラータイプです。

code string

エラーコードです。

message string

エラーメッセージです。

param string

エラーに関連するパラメーター (session.modalities など)。

session.created

クライアントが接続すると、サーバーは最初にこのイベントを返します。このイベントには、接続のデフォルト構成が含まれています。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_QxBGpjBDmDDQQWDtrqBKB",
    "type": "session.created",
    "session": {
        "id": "sess_OozZ1vtbPt2muDflHODIH",
        "object": "realtime.session",
        "model": "qwen3-livetranslate-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm24",
        "translation": {
           "language": "en"
        }
    }
}

type string

イベントタイプです。値は常に session.created です。

session object

セッション構成です。

プロパティ

id string

セッションの一意の識別子です。

object string

値は常に realtime.session です。

model string

使用中のモデルです。

modalities array

モデルの出力モダリティ設定です。

voice string

モデルによって生成された音声のボイスです。

input_audio_format string

入力音声のフォーマットです。値は常に pcm16 です。

output_audio_format string

出力音声のフォーマットです。値は常に pcm24 です。

translation object (任意)

翻訳構成です。

プロパティ

translation string (任意)

翻訳のターゲット言語です。

session.updated

サーバーは session.update リクエストを受信すると、成功した場合はこのイベントを返します。エラーが発生した場合は error イベントを返します。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_QxBGpjBDmDDQQWDtrqBKB",
    "type": "session.updated",
    "session": {
        "id": "sess_OozZ1vtbPt2muDflHODIH",
        "object": "realtime.session",
        "model": "qwen3-livetranslate-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Ethan",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm24",
        "translation": {
           "language": "en"
        }
    }
}

type string

イベントタイプです。値は常に session.updated です。

session object

セッション構成です。

プロパティ

id string

セッションの一意の識別子です。

object string

値は常に realtime.session です。

model string

使用中のモデルです。

modalities array

モデルの出力モダリティ設定です。

voice string

モデルによって生成された音声のボイスです。

input_audio_format string

入力音声のフォーマットです。値は常に pcm16 です。

output_audio_format string

出力音声のフォーマットです。値は常に pcm24 です。

translation object (任意)

翻訳構成です。

プロパティ

translation string (任意)

翻訳のターゲット言語です。

session.finished

このイベントは、セッションが終了し、現在のセッションのすべての音声翻訳が完了したことを示します。

このイベントは、クライアントが session.finish リクエストを送信した後にのみ送信されます。このイベントを受信した後、クライアントは切断できます。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_xxx",
    "type": "session.finished"
}

type string

イベントタイプです。値は常に session.finished です。

response.created

サーバーが新しいモデル応答を生成すると、このイベントを返します。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_L8hHVI5jYis6BzAjnPWJh",
    "type": "response.created",
    "response": {
        "id": "resp_P79OOMs8LnrXVpiIHUCKR",
        "object": "realtime.response",
        "conversation_id": "conv_UFClXtYkRkFXrs48y8pmK",
        "status": "in_progress",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm24",
        "output": []
    }
}

type string

イベントタイプです。値は常に response.created です。

response object

応答オブジェクトです。

プロパティ

id string

応答の一意の識別子です。

conversation_id string

現在のセッションの一意の識別子です。

object string

オブジェクトタイプです。このイベントでは、値は常に realtime.response です。

status string

応答ステータスです。有効な値：

completed
failed
in_progress
incomplete

modalities array

応答モダリティです。

voice string

生成された音声のボイスです。

output_audio_format string

出力音声のフォーマットです。値は pcm24 に固定されています。

output string

このイベントは現在空です。

response.done

サーバーは、応答が生成された後にこのイベントを返します。イベント内の response オブジェクトには、生の音声データを除くすべての出力項目が含まれています。

event_id string

このイベントの一意の識別子です。

{
  "event_id": "event_CNea8oXNipVanSg2VIzkO",
  "type": "response.done",
  "response": {
    "id": "resp_TfhYTqej692vsGA2jNEtH",
    "object": "realtime.response",
    "conversation_id": "conv_ZtyLfKVm8XqLwYRlsuDih",
    "status": "completed",
    "modalities": [
      "text",
      "audio"
    ],
    "voice": "Cherry",
    "output_audio_format": "pcm24",
    "output": [
      {
        "id": "item_MKtkMwN9RtcyE9eJShyWy",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
          {
            "type": "audio",
            "transcript": "Hello? "
          }
        ]
      }
    ],
    "usage": {
      "total_tokens": 56,
      "input_tokens": 47,
      "output_tokens": 9,
      "input_tokens_details": {
        "text_tokens": 20,
        "audio_tokens": 27
      },
      "output_tokens_details": {
        "text_tokens": 2,
        "audio_tokens": 7
      }
    }
  }
}

type string

イベントタイプです。値は常に response.done です。

response object

応答オブジェクトです。

プロパティ

id string

応答の一意の識別子です。

conversation_id string

現在のセッションの一意の識別子です。

object string

オブジェクトタイプです。このイベントでは、値は常に realtime.response です。

status string

応答のステータスです。

modalities array

応答のモダリティです。

voice string

モデルによって生成された音声に使用されるボイスです。

output_audio_format string

出力音声のフォーマットです。値は常に pcm24 です。

output object

応答の出力です。

プロパティ

id string

応答出力の一意の識別子です。

type string

出力項目のタイプです。現在の値は常に message です。

object string

出力項目のオブジェクトタイプです。現在の値は常に realtime.item です。

status string

出力項目のステータスです。

role string

出力項目のロールです。

content array

出力項目の内容です。

プロパティ

type string

出力内容のタイプです。プレーンテキスト出力の場合は text、出力に音声が含まれる場合は audio です。

text string

出力のテキスト内容です。

transcript string

音声内容のテキスト書き起こしです。

usage object

この応答のトークン消費情報です。

response.text.text

出力モダリティがテキストのみで、モデルがテキストを増分生成する場合、サーバーはこのイベントを返します。

event_id string

イベントの一意の識別子です。

{
    "event_id": "event_B1lIeyOXR7qJMEExbqtTG",
    "type": "response.text.text",
    "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
    "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
    "output_index": 0,
    "content_index": 0,
    "text": "How are"
}

type string

イベントのタイプです。値は常に response.text.text です。

text string

返される増分テキストです。

response_id string

応答 ID です。

item_id string

メッセージ項目の一意の識別子です。

output_index integer

現在、値は常に 0 です。

content_index integer

現在、値は常に 0 です。

response.text.done

サーバーは、モデルがテキストのみの出力のテキスト生成を完了したときにこのイベントを返します。

応答が中断、不完全、またはキャンセルされた場合も、サーバーはこのイベントを返します。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_B1lIeE2Nac33zn5V7h2mm",
    "type": "response.text.done",
    "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
    "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
    "output_index": 0,
    "content_index": 0,
    "text": "How can I assist you today?"
}

type string

イベントタイプです。値は常に response.text.done です。

response_id string

応答の一意の識別子です。

item_id string

メッセージ項目の一意の識別子です。

output_indexinteger

現在の値は常に 0 です。

content_indexinteger

現在の値は常に 0 です。

text string

モデルからの完全なテキスト出力です。

response.audio.delta

出力モダリティに音声が含まれ、モデルが音声データを増分生成する場合、サーバーはこのイベントを返します。

event_id string

イベントの一意の識別子です。

{
    "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
    "type": "response.audio.delta",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
    "output_index": 0,
    "content_index": 0,
    "delta": "UklGRnoGAABXQVZFZm10IBAAAAAB..."
}

type string

イベントタイプです。値は常に response.audio.delta です。

response_id string

応答の一意の識別子です。

item_id string

メッセージ項目の一意の識別子です。

output_index integer

値は常に 0 です。

content_index integer

値は常に 0 です。

delta string

モデルによって出力される増分音声データです。データは Base64 でエンコードされています。

response.audio.done

出力モダリティに音声が含まれる場合、音声生成が完了するとサーバーはこのイベントを返します。

応答が中断、不完全、またはキャンセルされた場合も、サーバーはこのイベントを返します。

このイベントには、完全な音声データは含まれていません。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_B1osWMWoDRYyITDyNYcBu",
    "type": "response.audio.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
    "output_index": 0,
    "content_index": 0
}

type string

イベントタイプです。これは常に response.audio.done です。

response_id string

応答の一意の識別子です。

item_id string

メッセージ項目の一意の識別子です。

output_indexinteger

値は常に 0 です。

content_indexinteger

値は常に 0 です。

conversation.item.input_audio_transcription.text

input_audio_transcription.model パラメーターを設定すると、サーバーは入力音声の音声認識結果を元のソース言語のテキストとしてストリーミングします。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_xxx",
    "type": "conversation.item.input_audio_transcription.text",
    "item_id": "item_xxx",
    "content_index": 0,
    "text": "",
    "stash": "The weather is really nice today",
    "language": "zh"
}

type string

イベントタイプです。値は常に conversation.item.input_audio_transcription.text です。

item_id string

メッセージ項目の一意の識別子です。

content_index integer

現在の値は常に 0 です。

text string

確定した認識テキストです。

stash string

確認待ちの認識テキストです。このテキストは後続のイベントによって修正される可能性があります。

language string

検出されたソース言語です。

conversation.item.input_audio_transcription.completed

input_audio_transcription.model パラメーターが設定されている場合、音声認識が完了した後にサーバーはこのイベントを返します。このイベントには、最終的かつ完全な認識結果が含まれます。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_xxx",
    "type": "conversation.item.input_audio_transcription.completed",
    "item_id": "item_xxx",
    "content_index": 0,
    "transcript": "The weather is really nice today, let's go for a walk in the park.",
    "language": "zh"
}

type string

イベントタイプです。これは常に conversation.item.input_audio_transcription.completed です。

item_id string

メッセージ項目の一意の識別子です。

content_index integer

これは現在常に 0 です。

transcript string

元のソース言語での完全な音声認識結果です。

language string

検出されたソース言語です。

response.audio_transcript.text

出力モダリティに音声が含まれる場合、サーバーはリアルタイム翻訳を表示するためにこのイベントを返します。

event_id string

このイベントの一意の識別子です。

{
  "event_id": "event_xxx",
  "type": "response.audio_transcript.text",
  "response_id": "resp_xxx",
  "item_id": "item_xxx",
  "output_index": 0,
  "content_index": 0,
  "text": "Hello,",
  "stash": " who are you?"
}

type string

イベントのタイプです。値は常に response.audio_transcript.text です。

response_id string

応答の一意の識別子です。

item_id string

メッセージ項目の一意の識別子です。

output_index integer

現在、値は常に 0 です。

content_index integer

現在、値は常に 0 です。

text string

確定した翻訳テキストセグメントです。

stash string

初期翻訳からの一時的なテキストです。現在の text と連結して一時的な翻訳結果を形成します。システムは、response.audio_transcript.done イベントを受信するまで、response.audio_transcript.text イベントを通じて text と stash を継続的に更新します。その時点で、transcript フィールドから完全な最終翻訳テキストを取得できます。

response.audio_transcript.done

出力モダリティに音声が含まれ、モデルがテキストの生成を完了すると、サーバーはこのイベントを返します。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_VN4Q4GJugLcc1S23viW8E",
    "type": "response.audio_transcript.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_JvJauNH2CTXb1D9WV6pD4",
    "output_index": 0,
    "content_index": 0,
    "transcript": "How can I assist you today?"
}

type string

イベントタイプです。これは常に response.audio_transcript.done です。

response_id string

応答の一意の識別子です。

item_id string

メッセージ項目の一意の識別子です。

output_index integer

これは現在常に 0 です。

content_index integer

これは現在常に 0 です。

transcript string

完全なテキストです。

response.output_item.added

サーバーは、応答の生成中に新しい出力項目が作成されたときにこのイベントを返します。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_B4O5yPt3Gjnjy5eYH3plG",
    "type": "response.output_item.added",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "output_index": 0,
    "item": {
        "id": "item_OFaPGtzfWCPyGzxnuEX9i",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": []
    }
}

type string

イベントタイプです。値は常に response.output_item.added です。

response_id string

応答の一意の識別子です。

output_index integer

現在の値は常に 0 です。

item object

出力項目に関する情報です。

プロパティ

id string

出力項目の一意の識別子です。

type string

値は常に message です。

object string

値は常に realtime.item です。

status string

出力項目のステータスです。

role string

メッセージのロールです。

content string

メッセージの内容です。

response.output_item.done

サーバーは、新しい項目が完全に出力されたときにこのイベントを送信します。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_XkiwbYTBC9Wcdwy6uYJ2G",
    "type": "response.output_item.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "output_index": 0,
    "item": {
        "id": "item_JvJauNH2CTXb1D9WV6pD4",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "audio",
                "text": "Hello, I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?"
            }
        ]
    }
}

type string

イベントタイプです。値は常に response.output_item.done です。

response_id string

応答の一意の識別子です。

output_indexinteger

現在の値は常に 0 です。

itemobject

出力項目に関する情報です。

プロパティ

id string

出力項目の一意の識別子です。

object string

値は常に realtime.item です。

type string

値は常に message です。

status string

出力項目のステータスです。

role string

メッセージ送信者のロールです。

content string

メッセージの内容です。

response.content_part.added

このイベントは、新しいコンテンツパートが出力されるときにサーバーによって返されます。

event_id string

イベントの一意の ID です。

{
    "event_id": "event_J2UixwYKZsXg7c9YXZetL",
    "type": "response.content_part.added",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": ""
    }
}

type string

イベントのタイプです。値は常に response.content_part.added です。

response_id string

応答の一意の ID です。

item_id string

メッセージ項目の一意の ID です。

output_index integer

値は常に 0 です。

content_index integer

値は常に 0 です。

part object

出力項目情報です。

プロパティ

type string

コンテンツパートのタイプです。

text string

コンテンツパートのテキストです。

response.content_part.done

サーバーは、新しいコンテンツパートが完全に出力された後にこのイベントを返します。

event_id string

このイベントの一意の識別子です。

{
    "event_id": "event_VN4Q4GJugLcc1S23viW8E",
    "type": "response.content_part.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_JvJauNH2CTXb1D9WV6pD4",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": "Hello, I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?"
    }
}

type string

イベントタイプです。これは常に response.content_part.done です。

response_id string

応答の一意の識別子です。

item_id string

メッセージ項目の一意の識別子です。

output_index integer

値は常に 0 です。

content_index integer

値は常に 0 です。

part object

コンテンツパートに関する情報です。

プロパティ

type string

コンテンツパートのタイプです。

text string

コンテンツパートのテキストです。