Qwen3.5 LiveTranslate リアルタイム音声動画翻訳モデル - Alibaba Cloud Model Studio

qwen3.5-livetranslate-flash-realtime は、視覚情報を活用したリアルタイム翻訳モデルで、60 言語をサポートしています（うち 29 言語は音声＋テキスト出力、31 言語はテキストのみ出力）。ビデオストリームまたはローカルファイルから音声および画像入力を処理し、視覚的コンテキストを活用して翻訳精度を向上させ、リアルタイムで翻訳テキストおよび音声を出力します。

Function Compute を使用したワンクリックデプロイメントによるオンラインデモをお試しください。

主な機能

多言語対応：中国語、英語、フランス語、ドイツ語、ロシア語、日本語、韓国語、スペイン語、ポルトガル語、アラビア語など、60 言語間の翻訳をサポートします。うち 29 言語は音声およびテキスト出力、31 言語はテキストのみ出力です。
視覚情報による精度向上：口の動き、ジェスチャー、画面上のテキストなどの視覚的ヒントを分析し、特にノイズ環境下や曖昧な単語において翻訳精度を向上させます。
2.8 秒の低遅延：同時通訳レベルの遅延を 2.8 秒まで実現します。
高品質な同時通訳：意味単位を予測することで、言語間の語順の違いを解消し、オフライン翻訳と同等の品質を達成します。
自然な音声：元音声のイントネーションや感情を自動的に再現します。
ホットワード設定：特定の用語に対する翻訳精度を向上させるため、ホットワードを設定できます。
音声クローニング：話者の音声をクローンし、翻訳出力に使用します。サーバー側でのリアルタイムクローニングおよび事前登録済み音声プロファイルの両方をサポートします。

操作手順

1. 接続の構成

このモデルは WebSocket を介して接続します。以下のパラメーターを使用してください。

パラメーター	説明
endpoint	中国 (北京) リージョン：wss://dashscope.aliyuncs.com/api-ws/v1/realtime シンガポールリージョン：wss://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api-ws/v1/realtime。{WorkspaceId} は実際のワークスペース ID に置き換えてください。
クエリパラメーター	モデルクエリパラメーターにはモデル名を指定する必要があります。例：`?model=qwen3.5-livetranslate-flash-realtime`
メッセージヘッダー	認証には Bearer トークンを使用します：Authorization: Bearer DASHSCOPE_API_KEY DASHSCOPE_API_KEY は Model Studio で取得した API キーです。

接続サンプルコード（Python）：

WebSocket 接続の Python サンプルコード

# pip install websocket-client
import json
import websocket
import os

API_KEY=os.getenv("DASHSCOPE_API_KEY")
API_URL = "wss://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api-ws/v1/realtime?model=qwen3.5-livetranslate-flash-realtime"

headers = [
    "Authorization: Bearer " + API_KEY
]

def on_open(ws):
    print(f"Connected to server: {API_URL}")
def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))
def on_error(ws, error):
    print("Error:", error)

ws = websocket.WebSocketApp(
    API_URL,
    header=headers,
    on_open=on_open,
    on_message=on_message,
    on_error=on_error
)

ws.run_forever()

2. 言語、モダリティ、音声の構成

以下のパラメーターを含む session.update クライアントイベントを送信します。

言語
- ソース言語：session.input_audio_transcription.language パラメーターで設定します。
  
  デフォルト値は en（英語）です。
- ターゲット言語：session.translation.language パラメーターで設定します。
  
  デフォルト値は en（英語）です。
サポート言語一覧をご参照ください。
ソース言語認識結果の出力

session.input_audio_transcription.model を qwen3-asr-flash-realtime に設定すると、サーバーは翻訳結果に加えて入力音声の音声認識結果（原文テキスト）も返します。

サーバーは以下のイベントを返します。
- conversation.item.input_audio_transcription.text：認識結果をストリーミングで返します。
- conversation.item.input_audio_transcription.completed：認識完了後に最終結果を返します。
出力モダリティ

session.modalities パラメーターを ["text"]（テキストのみ）または ["text","audio"]（テキストおよび音声）に設定します。
音声

session.voice パラメーターで設定します。サポート音声一覧をご参照ください。
ホットワード

session.translation.corpus.phrases パラメーターでホットワードを設定します。ホットワードは、ソース用語とターゲット翻訳をマッピングするキーと値のペアで、特定の用語の翻訳精度を向上させます。

例："artificial intelligence" を "Artificial Intelligence" にマッピングします。
音声クローニング

session.enable_voice_clone、session.voice_clone_options.frequency、および session.voice パラメーターで設定します。以下の 3 つのモードをサポートします：事前登録済み音声プロファイル（frequency：never）、セッション開始時に一度だけサーバー側でクローン（once）、各応答前にリアルタイムでクローン（always）。詳細については、音声クローニングをご参照ください。

3. 音声および画像の入力

input_audio_buffer.append および input_image_buffer.append イベントを使用して、Base64 エンコードされた音声および画像データを送信します。音声入力は必須、画像入力は任意です。

画像はローカルファイルから取得するか、ビデオストリームからリアルタイムでキャプチャできます。

サーバーは自動的に発話区間を検出し、モデルの応答をトリガーします。

4. モデル応答の受信

サーバーが発話終了を検出した時点でモデルが応答します。応答形式は出力モダリティによって異なります。

テキストのみ出力

サーバーは翻訳済みテキスト全体を response.text.done イベントで返します。
テキストおよび音声出力
- テキスト
  
  サーバーは翻訳済みテキスト全体を response.audio_transcript.done イベントで返します。
- 音声
  
  サーバーは増分の Base64 エンコードされた音声データを response.audio.delta イベントで返します。

5. セッションの終了

すべての音声送信後、クライアントイベントを送信し、WebSocket 接続を閉じる前にサーバーから session.finished イベントが返されるのを待ってください。

session.finish を送信せずに WebSocket を閉じると、サーバーの VAD が最終発話セグメントの終了を検出できず、そのセグメントの翻訳結果が完全に失われ、接続が無期限にハングする可能性があります。切断前に必ずこのイベントを送信してください。

サポートモデル

モデル	バージョン	コンテキストウィンドウ	最大入力	最大出力
		（トークン）
qwen3.5-livetranslate-flash-realtime qwen3.5-livetranslate-flash-realtime-2026-05-19 のエイリアス	Stable	53,248	49,152	4,096
qwen3.5-livetranslate-flash-realtime-2026-05-19	Snapshot
qwen3-livetranslate-flash-realtime qwen3-livetranslate-flash-realtime-2025-09-22 のエイリアス	Stable	53,248	49,152	4,096
qwen3-livetranslate-flash-realtime-2025-09-22	Snapshot

クイックスタート

環境の準備

Python 3.10 以降が必要です。

まず、pyaudio をインストールします。

macOS

brew install portaudio && pip install pyaudio

Debian/Ubuntu

sudo apt-get install python3-pyaudio

or

pip install pyaudio

CentOS

sudo yum install -y portaudio portaudio-devel && pip install pyaudio

Windows

pip install pyaudio

次に、WebSocket の依存関係をインストールします。

pip install websocket-client==1.8.0 websockets

クライアントの作成

以下のコードで livetranslate_client.py という名前のファイルを作成します。

クライアントコード - livetranslate_client.py

import os
import time
import base64
import asyncio
import json
import websockets
import pyaudio
import queue
import threading
import traceback

class LiveTranslateClient:
    def __init__(self, api_key: str, target_language: str = "en", *, audio_enabled: bool = True):
        if not api_key:
            raise ValueError("API key cannot be empty.")
            
        self.api_key = api_key
        self.target_language = target_language
        self.audio_enabled = audio_enabled
        self.ws = None
        self.api_url = "wss://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api-ws/v1/realtime?model=qwen3.5-livetranslate-flash-realtime"
        
        # Audio input configuration (from microphone)
        self.input_rate = 16000
        self.input_chunk = 1600
        self.input_format = pyaudio.paInt16
        self.input_channels = 1
        
        # Audio output configuration (for playback)
        self.output_rate = 24000
        self.output_chunk = 2400
        self.output_format = pyaudio.paInt16
        self.output_channels = 1
        
        # State management
        self.is_connected = False
        self.audio_player_thread = None
        self.audio_playback_queue = queue.Queue()
        self.pyaudio_instance = pyaudio.PyAudio()
        self.session_finished_event = asyncio.Event()

    async def connect(self):
        """Establish a WebSocket connection to the translation service."""
        headers = {"Authorization": f"Bearer {self.api_key}"}
        try:
            self.ws = await websockets.connect(self.api_url, additional_headers=headers)
            self.is_connected = True
            print(f"Successfully connected to the server: {self.api_url}")
            await self.configure_session()
        except Exception as e:
            print(f"Connection failed: {e}")
            self.is_connected = False
            raise

    async def configure_session(self):
        """Configure the translation session, setting the target language, voice, etc."""
        config = {
            "event_id": f"event_{int(time.time() * 1000)}",
            "type": "session.update",
            "session": {
                # 'modalities' controls the output type.
                # ["text", "audio"]: Returns both translated text and synthesized audio (recommended).
                # ["text"]: Returns only the translated text.
                "modalities": ["text", "audio"] if self.audio_enabled else ["text"],
                "input_audio_format": "pcm",
                "output_audio_format": "pcm",
                # 'input_audio_transcription' configures source language recognition.
                # Set 'model' to 'qwen3-asr-flash-realtime' to also output the source language recognition result.
                # "input_audio_transcription": {
                #     "model": "qwen3-asr-flash-realtime",
                #     "language": "zh"  # source language, default 'en'
                # },
                "translation": {
                    "language": self.target_language,
                    # 'corpus' configures hotwords to improve the translation accuracy of specific terms.
                    # "corpus": {
                    #     "phrases": {
                    #         "Artificial Intelligence": "Artificial Intelligence",
                    #         "Machine Learning": "Machine Learning"
                    #     }
                    # }
                }
            }
        }
        print(f"Sending session configuration: {json.dumps(config, indent=2, ensure_ascii=False)}")
        await self.ws.send(json.dumps(config))

    async def send_audio_chunk(self, audio_data: bytes):
        """Encode and send an audio chunk to the server."""
        if not self.is_connected:
            return
            
        event = {
            "event_id": f"event_{int(time.time() * 1000)}",
            "type": "input_audio_buffer.append",
            "audio": base64.b64encode(audio_data).decode()
        }
        await self.ws.send(json.dumps(event))

    async def send_image_frame(self, image_bytes: bytes, *, event_id: str | None = None):
        # Send an image frame to the server.
        if not self.is_connected:
            return

        if not image_bytes:
            raise ValueError("image_bytes cannot be empty.")

        # Encode to Base64
        image_b64 = base64.b64encode(image_bytes).decode()

        event = {
            "event_id": event_id or f"event_{int(time.time() * 1000)}",
            "type": "input_image_buffer.append",
            "image": image_b64,
        }

        await self.ws.send(json.dumps(event))

    def _audio_player_task(self):
        stream = self.pyaudio_instance.open(
            format=self.output_format,
            channels=self.output_channels,
            rate=self.output_rate,
            output=True,
            frames_per_buffer=self.output_chunk,
        )
        try:
            while self.is_connected or not self.audio_playback_queue.empty():
                try:
                    audio_chunk = self.audio_playback_queue.get(timeout=0.1)
                    if audio_chunk is None: # Termination signal
                        break
                    stream.write(audio_chunk)
                    self.audio_playback_queue.task_done()
                except queue.Empty:
                    continue
        finally:
            stream.stop_stream()
            stream.close()

    def start_audio_player(self):
        """Start the audio player thread (only when audio output is enabled)."""
        if not self.audio_enabled:
            return
        if self.audio_player_thread is None or not self.audio_player_thread.is_alive():
            self.audio_player_thread = threading.Thread(target=self._audio_player_task, daemon=True)
            self.audio_player_thread.start()

    async def handle_server_messages(self, on_text_received):
        """Handle incoming messages from the server in a loop."""
        try:
            async for message in self.ws:
                event = json.loads(message)
                event_type = event.get("type")
                if event_type == "response.audio.delta" and self.audio_enabled:
                    audio_b64 = event.get("delta", "")
                    if audio_b64:
                        audio_data = base64.b64decode(audio_b64)
                        self.audio_playback_queue.put(audio_data)

                elif event_type == "response.done":
                    print("\n[INFO] Response round complete.")
                    usage = event.get("response", {}).get("usage", {})
                    if usage:
                        print(f"[INFO] token usage: {json.dumps(usage, indent=2, ensure_ascii=False)}")
                elif event_type == "session.finished":
                    print("[INFO] Session finished.")
                    self.session_finished_event.set()
                # Process source language recognition results (requires enabling input_audio_transcription.model)
                # elif event_type == "conversation.item.input_audio_transcription.text":
                #     stash = event.get("stash", "")  # Pending recognition text
                #     print(f"[Recognizing] {stash}")
                # elif event_type == "conversation.item.input_audio_transcription.completed":
                #     transcript = event.get("transcript", "")  # Complete recognition result
                #     print(f"[Source language] {transcript}")
                elif event_type == "response.audio_transcript.done":
                    print("\n[INFO] Translation complete.")
                    text = event.get("transcript", "")
                    if text:
                        print(f"[INFO] Translated text: {text}")
                elif event_type == "response.text.done":
                    print("\n[INFO] Translation complete.")
                    text = event.get("text", "")
                    if text:
                        print(f"[INFO] Translated text: {text}")

        except websockets.exceptions.ConnectionClosed as e:
            print(f"[WARNING] Connection closed: {e}")
            self.is_connected = False
        except Exception as e:
            print(f"[ERROR] An unexpected error occurred while processing messages: {e}")
            traceback.print_exc()
            self.is_connected = False

    async def start_microphone_streaming(self):
        """Capture audio from the microphone and stream it to the server."""
        stream = self.pyaudio_instance.open(
            format=self.input_format,
            channels=self.input_channels,
            rate=self.input_rate,
            input=True,
            frames_per_buffer=self.input_chunk
        )
        print("Microphone is on. Start speaking...")
        try:
            while self.is_connected:
                audio_chunk = await asyncio.get_event_loop().run_in_executor(
                    None, stream.read, self.input_chunk
                )
                await self.send_audio_chunk(audio_chunk)
        finally:
            stream.stop_stream()
            stream.close()

    async def close(self):
        """Gracefully close the connection and release resources."""
        # Send session.finish to ensure the server completes translation of the final speech segment
        if self.is_connected and self.ws:
            finish_event = {
                "event_id": f"event_{int(time.time() * 1000)}",
                "type": "session.finish",
            }
            await self.ws.send(json.dumps(finish_event))
            print("Sent session.finish, waiting for server to finish processing...")
            try:
                await asyncio.wait_for(self.session_finished_event.wait(), timeout=15)
                print("Server processing complete.")
            except asyncio.TimeoutError:
                print("Timed out waiting for session.finished.")
        self.is_connected = False
        if self.ws:
            await self.ws.close()
            print("WebSocket connection closed.")
        
        if self.audio_player_thread:
            self.audio_playback_queue.put(None) # Send termination signal
            self.audio_player_thread.join(timeout=1)
            print("Audio player thread stopped.")
            
        self.pyaudio_instance.terminate()
        print("PyAudio instance released.")

モデルとの対話

同じディレクトリに、以下のコードで main.py という名前のファイルを作成します。

main.py

import os
import asyncio
from livetranslate_client import LiveTranslateClient

def print_banner():
    print("=" * 60)
    print("  Powered by Qwen qwen3.5-livetranslate-flash-realtime")
    print("=" * 60 + "\n")

def get_user_config():
    """Get user configuration."""
    print("Select a mode:")
    print("1. Voice + Text [Default] | 2. Text Only")
    mode_choice = input("Enter your choice (press Enter for Voice + Text): ").strip()
    audio_enabled = (mode_choice != "2")

    if audio_enabled:
        lang_map = {
            "1": "en", "2": "zh", "3": "ru", "4": "fr", "5": "de", "6": "pt",
            "7": "es", "8": "it", "9": "ko", "10": "ja", "11": "yue"
        }
        print("Select the target language (Voice + Text mode):")
        print("1. English | 2. Chinese | 3. Russian | 4. French | 5. German | 6. Portuguese | 7. Spanish | 8. Italian | 9. Korean | 10. Japanese | 11. Cantonese")
    else:
        lang_map = {
            "1": "en", "2": "zh", "3": "ru", "4": "fr", "5": "de", "6": "pt", "7": "es", "8": "it",
            "9": "id", "10": "ko", "11": "ja", "12": "vi", "13": "th", "14": "ar",
            "15": "yue", "16": "hi", "17": "el", "18": "tr"
        }
        print("Select the target language (Text Only mode):")
        print("1. English | 2. Chinese | 3. Russian | 4. French | 5. German | 6. Portuguese | 7. Spanish | 8. Italian | 9. Indonesian | 10. Korean | 11. Japanese | 12. Vietnamese | 13. Thai | 14. Arabic | 15. Cantonese | 16. Hindi | 17. Greek | 18. Turkish")

    choice = input("Enter your choice (defaults to the first option): ").strip()
    target_language = lang_map.get(choice, next(iter(lang_map.values())))

    return target_language, audio_enabled

async def main():
    """Main program entry point."""
    print_banner()
    
    api_key = os.environ.get("DASHSCOPE_API_KEY")
    if not api_key:
        print("[ERROR] Please set the DASHSCOPE_API_KEY environment variable.")
        print("  For example: export DASHSCOPE_API_KEY='your_api_key_here'")
        return
        
    target_language, audio_enabled = get_user_config()
    print("\nConfiguration complete:")
    print(f"  - Target language: {target_language}")
    if not audio_enabled:
        print("  - Output mode: Text Only")

    client = LiveTranslateClient(api_key=api_key, target_language=target_language, audio_enabled=audio_enabled)
    
    # Define the callback function.
    def on_translation_text(text):
        print(text, end="", flush=True)

    try:
        print("Connecting to the translation service...")
        await client.connect()
        
        # Start audio playback based on the mode.
        client.start_audio_player()
        
        print("\n" + "-" * 60)
        print("Connection successful! Speak into the microphone.")
        print("The program will translate your speech in real time and play the translated audio. Press Ctrl+C to exit.")
        print("-" * 60 + "\n")

        # Run message handling and microphone recording concurrently.
        message_handler = asyncio.create_task(client.handle_server_messages(on_translation_text))
        tasks = [message_handler]
        # Capture audio from the microphone for translation, regardless of whether audio output is enabled.
        microphone_streamer = asyncio.create_task(client.start_microphone_streaming())
        tasks.append(microphone_streamer)

        await asyncio.gather(*tasks)

    except KeyboardInterrupt:
        print("\n\nUser interrupted. Exiting...")
    except Exception as e:
        print(f"\nA critical error occurred: {e}")
    finally:
        print("\nCleaning up resources...")
        await client.close()
        print("Program exited.")

if __name__ == "__main__":
    asyncio.run(main())

main.py を実行し、マイクに向かって話しかけてください。モデルはリアルタイムで音声を翻訳し、音声およびテキストを出力します。

音声クローニング

このモデルは、入力音声から話者の音声をクローンし、翻訳出力に使用します。事前登録済みの音声プロファイルを使用するか、サーバー側でリアルタイムにクローンできます。会議通訳、ライブ配信、動画の吹き替えに役立ちます。

session.update で以下のパラメーターを設定して音声クローニングを有効化します。

session.enable_voice_clone：true に設定して音声クローニングを有効化します。
session.voice_clone_options.frequency：音声クローニングのタイミングを制御します。許容値：
- never：サーバー側でクローンしません。代わりに事前登録済みの音声プロファイルを使用します。session.voice をカスタムクローン音声 ID に設定します。
- once：セッション開始時に一度だけ入力音声から音声をクローンし、その後のすべての出力で再利用します。単一話者シナリオに最適です。session.voice を default に設定します。
- always：各応答前に音声をクローンし、話者の変化に動的に対応します。複数話者会話に最適です。session.voice を default に設定します。
session.voice：出力音声を指定します。値は frequency 設定に依存します。
- default に設定： frequency を once または always に設定する場合に使用します。サーバーは入力音声から話者の音声をクローンします。クローン完了まではデフォルト音声が使用されます。
- カスタムクローン音声 ID（例：qwen-translate-vc-xxx-yyy-zzz）に設定： frequency を never に設定する場合に使用します。音声クローニング API を使用して事前に音声を準備する必要があります。このとき、targetModel は qwen3.5-livetranslate-flash-realtime に設定してください。

frequency が once または always に設定されている場合、voice パラメーターは default に設定する必要があります。他の値を設定すると、サーバーがエラーを返します。

音声クローニング構成例

事前登録済み音声プロファイル（品質が安定。音声アイデンティティの一貫性が求められる場合に推奨）：

{
    "type": "session.update",
    "session": {
        "modalities": ["text","audio"],
        "voice": "qwen-translate-vc-xxx-yyy-zzz",
        "translation": {
            "language": "en"
        },
        "enable_voice_clone": true,
        "voice_clone_options": {
            "frequency": "never"
        }
    }
}

サーバー側でセッションごとに一度だけクローン（単一話者シナリオに最適）：

{
    "type": "session.update",
    "session": {
        "modalities": ["text","audio"],
        "voice": "default",
        "translation": {
            "language": "en"
        },
        "enable_voice_clone": true,
        "voice_clone_options": {
            "frequency": "once"
        }
    }
}

サーバー側で各応答ごとにクローン（複数話者会話に最適）：

{
    "type": "session.update",
    "session": {
        "modalities": ["text","audio"],
        "voice": "default",
        "translation": {
            "language": "en"
        },
        "enable_voice_clone": true,
        "voice_clone_options": {
            "frequency": "always"
        }
    }
}

画像による翻訳精度の向上

画像入力により、翻訳中に同音異義語の曖昧さを解消し、一般的でない固有名詞を認識しやすくなります。1 秒あたり最大 2 枚の画像を送信できます。

以下のサンプル画像をダウンロードしてください：medical mask.png、masquerade mask.png

以下のコードを livetranslate_client.py と同じディレクトリにダウンロードして実行します。マイクに向かって "What is mask?" と話しかけてください。モデルは画像を使用して曖昧さを解消します：medical mask.png の場合は「医療用マスクとは何ですか？」、masquerade mask.png の場合は「仮面舞踏会のマスクとは何ですか？」と翻訳されます。

import os
import time
import json
import asyncio
import contextlib
import functools

from livetranslate_client import LiveTranslateClient

IMAGE_PATH = "medical mask.png"
# IMAGE_PATH = "masquerade mask.png"

def print_banner():
    print("=" * 60)
    print("  Powered by Qwen qwen3.5-livetranslate-flash-realtime — single-turn interaction example (mask)")
    print("=" * 60 + "\n")

async def stream_microphone_once(client: LiveTranslateClient, image_bytes: bytes):
    pa = client.pyaudio_instance
    stream = pa.open(
        format=client.input_format,
        channels=client.input_channels,
        rate=client.input_rate,
        input=True,
        frames_per_buffer=client.input_chunk,
    )
    print(f"[INFO] Recording started. Please speak...")
    loop = asyncio.get_event_loop()
    last_img_time = 0.0
    frame_interval = 0.5  # 2 fps
    try:
        while client.is_connected:
            data = await loop.run_in_executor(None, stream.read, client.input_chunk)
            await client.send_audio_chunk(data)

            # Append an image frame every 0.5 seconds
            now = time.time()
            if now - last_img_time >= frame_interval:
                await client.send_image_frame(image_bytes)
                last_img_time = now
    finally:
        stream.stop_stream()
        stream.close()

async def main():
    print_banner()
    api_key = os.environ.get("DASHSCOPE_API_KEY")
    if not api_key:
        print("[ERROR] Please set the DASHSCOPE_API_KEY environment variable.")
        return

    client = LiveTranslateClient(api_key=api_key, target_language="zh", audio_enabled=True)

    def on_text(text: str):
        print(text, end="", flush=True)

    try:
        await client.connect()
        client.start_audio_player()
        message_task = asyncio.create_task(client.handle_server_messages(on_text))
        with open(IMAGE_PATH, "rb") as f:
            img_bytes = f.read()
        await stream_microphone_once(client, img_bytes)
        await asyncio.sleep(15)
    finally:
        await client.close()
        if not message_task.done():
            message_task.cancel()
            with contextlib.suppress(asyncio.CancelledError):
                await message_task

if __name__ == "__main__":
    asyncio.run(main())

ワンクリック Function Compute デプロイメント

アプリケーションをデプロイするには、以下の手順を実行します。

Function Compute テンプレートを開き、API キーを入力して、[デフォルト環境の作成とデプロイ] をクリックしてアプリケーションをテストします。
約 1 分待ちます。[環境の詳細] > [環境コンテキスト] でエンドポイントを取得し、プロトコルを http から https に変更します（例：https://qwen-livetranslate-flash-realtime-intl.fcv3.xxx.ap-southeast-1.fc.devsapp.net/）。ブラウザで URL を開き、モデルと対話します。

重要
このエンドポイントは自己署名証明書を使用しており、一時的なテスト専用です。初回アクセス時にブラウザがセキュリティ警告を表示しますが、これは想定された動作です。本番環境ではこのエンドポイントを使用しないでください。警告画面で指示に従い（例：詳細 → （安全でない）サイトに進む）、先に進んでください。

Resource Access Management 権限の設定を求められた場合は、画面の指示に従ってください。

プロジェクトのソースコードを確認するには、[リソース情報] > [関数リソース] を参照してください。

Function Compute および Model Studio は新規ユーザー向けに無料クォータを提供しており、基本的なデバッグに十分です。無料クォータを使い切った後は、従量課金が適用されます。

対話フロー

翻訳はイベント駆動型の WebSocket モデルを使用します。サーバーは自動的に発話区間を検出し、応答します。

ライフサイクル	クライアントイベント	サーバーイベント
セッション初期化	session.update セッション構成	session.created セッション作成完了 session.updated セッション構成更新完了
ユーザー音声入力	input_audio_buffer.append バッファーに音声を追加	なし
サーバー音声出力	なし	response.created サーバーが応答生成を開始したことを通知 response.output_item.added 新しい出力アイテムが利用可能になったことを通知 response.content_part.added アシスタントメッセージに新しいコンテンツ部分が追加されたことを通知 response.audio_transcript.text テキストトランスクリプトの増分更新を含む response.audio.delta 合成音声の増分チャンクを含む response.audio_transcript.done テキストトランスクリプト全体が完了したことを通知 response.audio.done 合成音声が完了したことを通知 response.content_part.done アシスタントメッセージのテキストまたは音声コンテンツ部分が完了したことを通知 response.output_item.done アシスタントメッセージの出力アイテム全体が完了したことを通知 response.done 応答全体が完了したことを通知
セッション終了	session.finish 音声入力完了をサーバーに通知	session.finished サーバー処理完了。セッション終了

すべての音声送信後、session.finish イベントを送信し、WebSocket を閉じる前に session.finished を待ってください。session.finish を送信せずに接続を閉じると、サーバーの VAD が最終発話セグメントの終了を検出できず、そのセグメントの翻訳結果が完全に失われます。

API

Qwen-Livetranslate-Realtime。

課金

Qwen3.5-LiveTranslate-Flash-Realtime

音声：入力音声 1 秒あたり 7 トークン、出力音声 1 秒あたり 12.5 トークン。
画像：32×32 ピクセルごとに 0.5 トークンを消費します。
テキスト：ソース言語の音声認識を有効にすると、翻訳に加えて入力音声のトランスクリプトが返されます。このトランスクリプトは出力テキストトークンとして課金されます。

Qwen3-LiveTranslate-Flash-Realtime

音声：入力または出力の音声 1 秒あたり 12.5 トークンを消費します。
画像：28×28 ピクセルごとに 0.5 トークンを消費します。
テキスト：ソース言語の音声認識を有効にすると、翻訳に加えて入力音声のトランスクリプトが返されます。このトランスクリプトは出力テキストトークンとして課金されます。

料金：モデル一覧。

サポート言語

ソース言語およびターゲット言語を指定するには、以下の言語コードを使用します。

一部のターゲット言語はテキストのみをサポートします。レガシモデル qwen3-livetranslate-flash-realtime は以下の 18 言語のみをサポートします：en、zh、ru、fr、de、pt、es、it、id、ko、ja、vi、th、ar、yue、hi、el、tr。

言語コード	言語	出力
zh	中国語	音声＋テキスト
en	英語	音声＋テキスト
ar	アラビア語	音声＋テキスト
de	ドイツ語	音声＋テキスト
fr	フランス語	音声＋テキスト
es	スペイン語	音声＋テキスト
pt	ポルトガル語	音声＋テキスト
id	インドネシア語	音声＋テキスト
it	イタリア語	音声＋テキスト
ko	韓国語	音声＋テキスト
ru	ロシア語	音声＋テキスト
th	タイ語	音声＋テキスト
vi	ベトナム語	音声＋テキスト
ja	日本語	音声＋テキスト
tr	トルコ語	音声＋テキスト
hi	ヒンディー語	音声＋テキスト
ms	マレー語	音声＋テキスト
nl	オランダ語	音声＋テキスト
ur	ウルドゥー語	音声＋テキスト
nb	ノルウェー語（ブークモール）	音声＋テキスト
sv	スウェーデン語	音声＋テキスト
da	デンマーク語	音声＋テキスト
he	ヘブライ語	音声＋テキスト
fi	フィンランド語	音声＋テキスト
pl	ポーランド語	音声＋テキスト
is	アイスランド語	音声＋テキスト
cs	チェコ語	音声＋テキスト
fil	フィリピン語	音声＋テキスト
fa	ペルシャ語	音声＋テキスト
yue	広東語	テキスト
el	ギリシャ語	テキスト
af	アフリカーンス語	テキスト
ast	アストゥリアス語	テキスト
be	ベラルーシ語	テキスト
bg	ブルガリア語	テキスト
bn	ベンガル語	テキスト
bs	ボスニア語	テキスト
ca	カタロニア語	テキスト
ceb	セブアノ語	テキスト
et	エストニア語	テキスト
gl	ガリシア語	テキスト
gu	グジャラート語	テキスト
hr	クロアチア語	テキスト
hu	ハンガリー語	テキスト
jv	ジャワ語	テキスト
kk	カザフ語	テキスト
kn	カンナダ語	テキスト
ky	キルギス語	テキスト
lv	ラトビア語	テキスト
mk	マケドニア語	テキスト
ml	マラヤーラム語	テキスト
mr	マラーティー語	テキスト
pa	パンジャーブ語	テキスト
ro	ルーマニア語	テキスト
sk	スロバキア語	テキスト
sl	スロベニア語	テキスト
sw	スワヒリ語	テキスト
tg	タジク語	テキスト
az	アゼルバイジャン語	テキスト
uk	ウクライナ語	テキスト

サポート音声

サポート音声および対応する voice パラメーター値については、音声一覧をご参照ください。