Alibaba Open Source New Generation Self Developed Speech Recognition Model DFSMN

Recently, the Machine Intelligence Laboratory of Alibaba Dharma Institute launched a new generation of speech recognition model DFSMN, which raised the global speech recognition accuracy rate to 96.04% (this data test is based on the world's largest free speech recognition database LibriSpeech).

The Speech Recognition Team of the Machine Intelligence Laboratory of DAMO Academy led the research and development of this model and announced that it will be open sourced to enterprises and individuals around the world. Compared with the most widely used LSTM model in the industry, the DFSMN model has faster training speed and higher recognition accuracy. Smart audio or smart home devices using the new DFSMN model, compared with the previous generation technology, the deep learning training speed has been increased by 3 times, and the speech recognition speed has been increased by 2 times.

Development pattern for contributors

*Create a personal fork of the main Kaldi repository in GitHub.
*Make your changes in a named branch different from master, e.g. you create a branch my-awesome-feature.
*Generate a pull request through the Web interface of GitHub.
*As a general rule, please follow Google C++ Style Guide. There are a few exceptions in Kaldi. You can use the Google's to verify that your code is free of basic mistakes.

Platform specific notes

PowerPC 64bits little-endian (ppc64le)

*Kaldi is expected to work out of the box in RHEL >= 7 and Ubuntu >= 16.04 with OpenBLAS, ATLAS, or CUDA.
*CUDA drivers for ppc64le can be found at
*An IBM Redbook is available as a guide to install and configure CUDA.


Kaldi supports cross compiling for Android using Android NDK, clang++ and OpenBLAS.
See this blog post for details.

Alibaba has open sourced its self-developed DFSMN speech recognition model on the GitHub platform

At the recently held Yunqi Conference Wuhan Summit, the "AI cashier" equipped with the DFSMN voice recognition model accurately recognized the user's voice order in a noisy environment in a PK with a real clerk, and within 49 seconds Ordered 34 cups of coffee. In addition, automatic ticket vending machines equipped with this voice recognition technology have also been "on duty" in the Shanghai Metro.

A well-known speech recognition expert and professor at Northwestern Polytechnical University, said: "Ali's open-source DFSMN model is a breakthrough in the steady improvement of speech recognition accuracy. It is the most representative model of deep learning in the field of speech recognition in recent years. One of the results. It has a huge impact on the global academic community and the application of AI technology.” According to industry insiders, DFSMN is expected to become one of the most important acoustic recognition models in the global speech recognition field after the traditional LSTM model.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us