
Qwen3.5-LiveTranslate-Flash is the latest simultaneous interpretation model in the Qwen family, built on top of Qwen3.5-Omni. It delivers real-time, multimodal translation that not only hears and translates speech, but also sees and understands visual context to produce more accurate translations. Compared with its predecessor Qwen3-LiveTranslate, Qwen3.5-LiveTranslate-Flash brings major upgrades across language coverage, latency, voice cloning, and terminology handling, making it well-suited for international meetings, livestream localization, online classrooms, and business negotiations.
We evaluate Qwen3.5-LiveTranslate-Flash in both offline and real-time (streaming) settings.
On public multilingual speech translation benchmarks (FLEURS, CoVoST2), Qwen3.5-LiveTranslate-Flash achieves higher translation accuracy than mainstream commerical large speech models, significantly surpasses its predecessor Qwen3-LiveTranslate-Flash, and delivers breakthroughs in both language coverage and translation quality.






With the Readable Unit streaming strategy, Qwen3.5-LiveTranslate-Flash reduces first-token latency by 3.45 s and per-token latency by 1.88 s compared to Qwen3-LiveTranslate-Flash, achieving an average speech-to-speech per-token latency of 2.8 s, with virtually no loss in translation quality.

Qwen3.5-LiveTranslate is a translation large model built on the Qwen3.5-Omni Thinker-Talker architecture. The Thinker receives interleaved visual and audio inputs and generates text translations, while the Talker takes the translated text and source audio to produce speech with crosslingual voice cloning. For real-time simultaneous interpretation, we adopt a chunk-wise streaming input mechanism and introduce Readable Unit tags to control speech synthesis granularity, effectively reducing interpretation latency. Meanwhile, dynamic crosslingual voice cloning enables the model to preserve the speaker’s original vocal characteristics during real-time translation.

Qwen3.5-LiveTranslate model architecture overview
Compared to Qwen3-LiveTranslate, Qwen3.5-LiveTranslate significantly expands language coverage. The support of input audio and output text grows from 18 to 60 languages, and output audio support from 10 to 29 languages, enabling a far wider range of cross-lingual translation combinations across global scenarios.
| Qwen3-LiveTranslate | Qwen3.5-LiveTranslate | |
|---|---|---|
| Input Modality | Audio / Video | Audio / Video |
| Inference Mode | Offline / Streaming | Offline / Streaming |
| Voice Cloning | ✗ | ✓ (3 modes: pre-registered / clone-once / real-time) |
| Hotwords | Up to 1,000 | Up to 1,000 |
| Input Audio Languages & Output Text Languages | 18 languages Chinese, English, Russian, French, German, Portuguese, Spanish, Italian, Indonesian, Korean, Japanese, Vietnamese, Thai, Arabic, Cantonese, Hindi, Greek, Turkish | 60 languages Afrikaans, Arabic, Asturian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Cantonese, Catalan, Cebuano, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Interlingua, Italian, Japanese, Javanese, Kannada, Kazakh, Korean, Kyrgyz, Lingala, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Norwegian Bokmål, Nynorsk, Odia, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uyghur, Vietnamese |
| Output Audio Languages | 10 languages Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | 29 languages Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian, Thai, Indonesian, Arabic, Vietnamese, Turkish, Finnish, Polish, Hindi, Dutch, Czech, Urdu, Filipino, Swedish, Danish, Hebrew, Icelandic, Malay, Norwegian, Persian |
A multilingual business meeting where participants speak in different languages and switch between them mid-sentence. Qwen3.5-LiveTranslate handles code-switching, diverse accents, and domain-specific terminology in real time — delivering fluent, natural translations without missing a beat.
A real-world travel scenario powered by Qwen AI Glasses: a Chinese tourist orders food at a local restaurant in Thailand. The model performs live Thai-to-Chinese translation on-device, combining visual context from the menu with spoken dialogue to produce accurate, context-aware translations — making cross-language communication effortless on the go.
E-commerce livestream translation scenario. Qwen3.5-LiveTranslate accurately translates product specifications and numerical information, ensuring precise cross-language delivery of product parameters.
A scene from Romance of the Three Kingdoms narrated in classical Chinese (文言文). Qwen3.5-LiveTranslate accurately interprets and translates archaic Chinese prose into modern English, demonstrating its ability to handle literary and historical language beyond everyday speech.
Qwen3.5-LiveTranslate leverages visual context to resolve translation ambiguities. When a word or phrase has multiple possible meanings, the model uses what it sees — on-screen text, objects, or scene context — to select the correct interpretation, producing translations that are both accurate and contextually grounded.
We will continue exploring the capability boundaries of multimodal translation and focus on the following directions:
Feel free to cite the following article if you find Qwen3.5-LiveTranslate helpful:
@misc{qwen35livetranslateblog,
title = {Qwen3.5-LiveTranslate: From Sound to Sight, From Word to Right},
url = {https://qwen.ai/blog?id=qwen3.5-livetranslate},
author = {Qwen Team},
month = {May},
year = {2026}
}
1,407 posts | 493 followers
FollowAlibaba Clouder - September 5, 2018
Alibaba Clouder - March 24, 2017
Alibaba Cloud Community - December 27, 2021
BaitaoShao - July 28, 2020
Alibaba Clouder - February 1, 2018
Alibaba Cloud Product Launch - December 12, 2018
1,407 posts | 493 followers
Follow
Alibaba Cloud Model Studio
A one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models
Learn More
Qwen
Full-range, open-source, multimodal, and multi-functional
Learn More
Alibaba Cloud for Generative AI
Accelerate innovation with generative AI to create new business success
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Cloud Community