Major Updates to Qwen Model Series; New Speech-to-Speech Model Fun-Audio-Chat-8B

Alibaba recently advanced its AI lineup with major Qwen image-editing and controllable TTS updates, open-sourced the emotion-aware speech-to-speech Fun-Audio-Chat-8B, and saw its efficient Z-Image-Turbo lead open-source text-to-image rankings.

Alibaba Introduced Updates to Qwen-Image-Edit and Qwen3-TTS Models

Alibaba unveiled two significant updates to its Qwen model series. The first, Qwen-Image-Edit-2511, represents a major leap forward from its predecessor, offering substantially improved consistency and more robust real-world image editing capabilities. Notably, the model excels in multi-person consistency, particularly in group photos and complex scenes, which enables the high-fidelity synthesis of two separate individual images into a seamless, coherent group portrait.

Qwen-Image-Edit-2511 also introduces enhanced performance in industrial and product design generation, significantly reduced image drift, and stronger identity and character consistency. Its improved geometric reasoning allows it to directly generate auxiliary construction lines, facilitating precise design work and annotation. Additionally, the model natively integrates a selection of popular community-developed LoRAs, unlocking their effects without requiring additional fine-tuning.

(Prompt: The lady is holding this cat)

In parallel, Alibaba has upgraded its Qwen3-TTS lineup with the introduction of VoiceDesign-VD-Flash, a breakthrough in controllable speech synthesis. This model supports fully customizable voice output through free-form text instructions, allowing precise control over tone, rhythm, emotion, and persona. It can even generate a unique, personalized vocal identity from scratch, without relying on preset voice templates. VoiceDesign-VD-Flash has already outperformed several leading proprietary models on role-playing benchmarks and is poised for deployment across creative industries, including audiobook narration, film and drama voiceovers, and animated character voice creation.

Alibaba Open-Sources Fun-Audio-Chat-8B: Advanced Speech-to-Speech AI Model

Alibaba has released Fun-Audio-Chat-8B, an open-sourced speech-to-speech model allows direct, natural audio interactions with users. As the latest addition to Alibaba’s Fun speech LLM family, the model is designed for diverse use cases such as audio chat, emotional companionship, smart device, and customer service automation.

A key capability of Fun-Audio-Chat-8B is emotion-aware conversation without explicit labels or prompts. The model can understand a user’s emotional state from cues such as semantics, tone, speaking rate, pauses, and emphasis, and responds with appropriate care or encouragement. The model also features strong function calling capabilities, allowing users to execute complex natural-language commands. The model interprets intent and invokes the right functions to complete tasks, supporting both single calls and multiple parallel calls to turn voice interactions into actionable outcomes.

The model addresses critical technical challenges that have plagued previous joint speech-text models. By introducing Dual-Resolution Speech Representations, Fun-Audio-Chat-8B reduces compute demands by up to 50% while maintaining high speech quality. The innovative Core-Cocktail training strategy is designed to preserve text LLM capabilities during multimodal training, mitigating temporal resolution mismatch and catastrophic interference. A multi-stage, multi-task post-training process further aligns responses with human preferences for both meaning and emotional nuance.

Fun-Audio-Chat-8B has demonstrated exceptional performance across multiple benchmarks, including OpenAudioBench, VoiceBench, and UltraEval-Audio, outperforming all comparable open-source models in its parameter class. The model is now freely available for the broader AI community through GitHub, HuggingFace, and ModelScope.

This article was originally published on Alizila.

Community

Major Updates to Qwen Model Series; New Speech-to-Speech Model Fun-Audio-Chat-8B

Alibaba Introduced Updates to Qwen-Image-Edit and Qwen3-TTS Models

Alibaba Open-Sources Fun-Audio-Chat-8B: Advanced Speech-to-Speech AI Model

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Container Compute Service (ACS)

Container Service for Kubernetes

YiDA Low-code Development Platform

mPaaS