Photo credit: Shutterstock
Alibaba Group's research institute DAMO Academy unveiled on Monday two large language models designed to reflect Southeast Asia's diverse linguistic and cultural landscape.
DAMO Academy released a model called SeaLLM and a conversationally finetuned version called SeaLLM-chat.
The models, which both come in two sizes, 13 billion and 7-billion-parameters, are capable of processing local languages including Vietnamese, Indonesian, Thai, Malay, Khmer, Lao, Tagalog, and Burmese. Both can perform tasks that better align with local customs, style and legal stipulations.
The initiative comes amid rising demand for more locally relevant LLMs from Southeast Asian countries. Singapore, as an example, has created a $52 million AI initiative to develop the Lion City's research and engineering capabilities in multi-modal LLMs.
Alibaba said the launches were designed to create more inclusive and regionally relevant LLMs that reflect the cultural nuances of Southeast Asia. Most LLMs originate from western countries and are trained on datasets that are based disproportionately on languages derived from English and languages derived from Latin.
“This innovation is set to hasten the democratization of AI, empowering communities historically underrepresented in the digital realm,” said Bing Lidong, Director of the Language Technology Lab at Alibaba's DAMO Academy.
DAMO Academy has open-sourced the models on Hugging Face, making them freely available for research and commercial use.
Trained on a diverse set of Southeast Asian languages, SeaLLM can interpret and process text up to nine times longer than models like ChatGPT for non-Latin languages, and has more complex task execution capabilities. It outperforms most open-source LLMs in understanding a wide spectrum of subjects from science, chemistry, physics to economics, in the region's languages.
The model outperforms other existing models in machine translation capabilities between English and low-resource languages, referring to those with limited data for training conversational AI systems, such as Lao and Khmer. It also delivers performance on par with state-of-the-art models in most high-resource languages, referring to languages for which many training data sources exist, such as Vietnamese and Indonesian.
Through pre-training enhancements and culturally tailored fine-tuning, the AI assistant powered by SeaLLM-chat can comprehend, respect and accurately reflect the cultural context of the languages in the region, including social norms, linguistic preferences and legal considerations.
“This initiative has the potential to unlock new opportunities for millions who speak languages beyond English and Chinese. Alibaba's efforts in championing inclusive technology have now reached a milestone with SeaLLM's launch,” said Luu Anh Tuan, Assistant Professor in the School of Computer Science and Engineering (SCSE) at Nanyang Technological University, Alibaba's long-term partner in multi-language AI studies.
The culturally attuned LLMs can also empower companies to build their own chatbot assistants for businesses dealing with Southeast Asian markets.
This article was originally published on Alizila, written by Ivy Yu.
Alibaba Cloud Attains Korea Information Security Management System (K-ISMS) Certification
1,003 posts | 246 followers
FollowAlibaba Clouder - March 20, 2020
Alibaba Cloud Community - February 10, 2022
Alibaba Clouder - October 26, 2018
Alibaba Clouder - September 29, 2019
Alibaba Cloud Community - January 11, 2022
Alibaba Cloud Community - January 30, 2024
1,003 posts | 246 followers
FollowTop-performance foundation models from Alibaba Cloud
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreAccelerate innovation with generative AI to create new business success
Learn MoreA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreMore Posts by Alibaba Cloud Community