×
Community Blog Alibaba Cloud Launches Qwen2-Audio Model to Analyze Speech and Audio

Alibaba Cloud Launches Qwen2-Audio Model to Analyze Speech and Audio

Alibaba Cloud’s open-sourced Qwen2-Audio is the latest iteration of its large audio language model (LLM) that can process audio and text input and generate text output.

1

Alibaba Cloud’s open-sourced Qwen2-Audio is the latest iteration of its large audio language model (LLM) that can process audio and text input and generate text output.

The LLM is capable of understanding more than 8 languages and dialects, such as Mandarin, Cantonese, English, French, Italian, Spanish, German, and Japanese.

Trained on an expanded data volume, Qwen2-Audio can achieve seamless voice and text interactions with users to perform voice chat and audio analysis tasks. The model can transcribe speeches and identify audio information from across a wide range of sounds, including spoken words, music and ambient noises.

Qwen2-Audio achieves state-of-the-art performance in tests focused on audio-centric instruction-following capabilities.

Since many previous test datasets are highly limited and cannot adequately reflect performance in real-world scenarios, the Qwen team also launched a benchmark designed to evaluate the ability of large audio language models to understand various types of audio signals.

At the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) hosted in Thailand last week, the study on benchmarking large audio-language models from the Qwen team was accepted as the main conference paper to be presented during the annual meeting.

In total, 38 papers from Alibaba Cloud were accepted by ACL 2024, the premiere conference for natural language research.


This article was originally published on Alizila, written by Elizabeth Utley and Ivy Yu.

0 1 0
Share on

Alibaba Cloud Community

1,006 posts | 247 followers

You may also like

Comments