Community Blog Applications of NLP and Voice Recognition

Applications of NLP and Voice Recognition

NLP works closely with speech/voice recognition and text recognition engines. Now NLP and associated AI technologies have entered the consumer realm.

NLP refers to the evolving set of computer and AI-based technologies that allow computers to learn, understand, and produce content in human languages. The technology works closely with speech/voice recognition and text recognition engines. While text/character recognition and speech/voice recognition allows computers to input the information, NLP allows making sense of this information.

Though scientists and researchers have done a lot of theoretical work on NLP in the past, we have only recently started seeing its real-world use cases. NLP-based systems are augmenting both human-human communication (e.g., with language translation) as well as human-machine communication (e.g., virtual assistants).

In 2011, IBM Watson beat its human competitors in a popular US quiz show Jeopardy. The news became instantly viral. Unlike other board games, Jeopardy posed major challenges for an AI machine. Watson answered complex riddles and questions on the quiz show displaying its prowess in understanding languages. The researchers spent more than three years training Watson's neural network for Jeopardy.

Since Watson's accomplishments, NLP and associated AI technologies have entered the consumer realm. All major enterprises are today deploying intelligent chatbots for customer support services. These chatbots can answer routine queries, help in ticketing, and offer faster issue resolutions. Businesses are experimenting with recruitment portals that use NLP to sift through numerous job applications and find better applicants for hiring.

Today NLP-based MT has become highly efficient and can offer translations quickly and efficiently. Tourists can use several mobile apps that rely on MT for assistance in understanding foreign languages during their travel. Businesses are implementing NLP solutions for social listening, customer communications, and crisis management. NLP is also improving spam filters, thus preventing users from frauds and unwanted emails.

A lot of financial market movements are affected by global events, political developments, government policy announcements, and the general economic environment in a region. NLP-based systems can read the news, press releases, and other financial reports to assess this environment. This ability makes automated financial advisors more efficient.

Alibaba Cloud has created a smart speaker system named Tmall Genie for sale on its e-commerce portal Tmall in July 2017. The system uses NLP and AliGenie voice assistant to receive customers' requests in Mandarin Chinese. Customers can use Tmall Genie for controlling smart home devices, searching and playing music, providing the latest news, and several other tasks. The speaker activates on hearing “Tmall Genie,” recognizes customers through voiceprint recognition, and allows them to place orders on Tmall.

In recent years, Alibaba Cloud has made a significant investment in augmenting its Big Data and AI capabilities. It offers a comprehensive suite of AI-based capabilities, including NLP, intelligent voice recognition, facial recognition, image recognition, video recognition, among others.

Please go to Breaking the Communication Barriers with Natural Language Processing (NLP) to learn more about the evolution of natural language processing and exploring Alibaba Cloud's NLP capabilities.

Related Blog Posts

Core Technologies of Alibaba Cloud Voice Recognition Model

Voice recognition plays an important role in AI and man-machine interaction. It provides the voice interaction capability of smart IoT home appliances and is also applicable to public services and smart government affairs.

Typically, a modern voice recognition system consists of three core components: an acoustic model, a language model, and a decoder. Such a system has been the most popular and widely used in the field of voice recognition though there are recent research attempts to build an end-to-end voice recognition system. The acoustic model is mainly used to create a probability mapping between voice input and acoustic unit output. The language model describes the probability matching of different words so that recognized sentences sound more like natural text. The decoder filters the scores of different matches by combining the probability values of acoustic units with the language model to obtain the final recognition results of the highest probability.

Voice recognition benefits from the deep learning technology that has become increasingly popular for recent years. The HMM-DNN acoustic model can replace the traditional HMM-GMM acoustic model to improve the accuracy of voice recognition by 20%. The NN-LM language model can work with the traditional N-Gram language model to further improve accuracy. Compared with language models, acoustic models are more compatible with deep neural network models and therefore draw more researchers.

This article summarizes the speech of Yan Zhijie, a Senior Algorithm Expert and the Chief Scientist of Man-Machine Interaction at Alibaba Cloud, on the acoustic and language models adopted by Alibaba Cloud Voice Recognition Technology, including the LC-BLSTM acoustic model, LFR-DFSMN acoustic model, and NN-LM language model.

Alibaba Discloses Future Plans for IoT and Intelligent Voice Systems

Alibaba hopes to build basic IoT infrastructure, and then connect 10 billion devices through the cloud within five years. There are three points to our line of thinking: Cloud computing is the heart, AI is the brain, and IoT forms the nerves. So what do we mean by that? This suits Alibaba, especially considering the developmental path taken by Alibaba. Alibaba started with computing and storage, then traditional cloud computing, and continued to dive deep into AI technology in recent years. Whether it is computer vision technology, speech interaction technology, NLP technology, the machine learning on which it is all founded, or management strategies, the flow of Alibaba's development has always been trending toward finding ways to help AI reach more users through the nerves of AI, and then produce more data and with it more value.

In addition to producing our TV box, Alibaba has also set up a joint venture with the industry's leading TV manufacturer Haier. Together, we have packaged our content, services, and voice technology and loaded the box into the TV, just like we saw in Haier's AI TV. We are gradually transitioning from the previous generation of remote controllers to fifth-generation artificial intelligence television such as Haier-Ali's AI TV, which the company showcased in Shanghai Fair. Through long-range far-field voice interaction, a wake-free method, and a combination of voice recognition, we are capable of identifying the age of the user to filter inappropriate content for children, and gradually apply intelligent voice interaction to the home entertainment environment.

Video AI: Next-Generation Intelligent Video Production

This article introduces the application of AI in media broadcasting, as well as Alibaba Cloud's exploration in next-generation intelligent media asset production service.

Intelligent catalog: Traditional intensive catalog takes about 2 to 4 hours for a 1-hour video. In the age of Internet and content explosion, intelligent catalog can apply technologies, such as video auto-classification, flagging, character recognition, and speech/voice recognition, to generate video information, add videos to the media asset database, and make intelligent recommendations based on scenarios of natural language processing (NLP) and part-of-speech filtering. The whole process is driven by algorithms with no need of human labor.

Real-time subtitles: Different from traditional manual conversion and translation, intelligent production can perform automatic voice-to-text conversion through automatic speech/voice recognition (ASR), store the text at the corresponding timeline, and then automatically translate the text from the original language to the language required. The amount of human intervention is greatly reduced. This technology is applicable to not only offline videos, but also live conference videos for real-time subtitles production.

Related Products

Machine Learning Platform for AI

Alibaba Cloud Machine Learning Platform for AI provides an all-in-one machine learning service featuring low user technical skills requirements, but with high performance results. On the Machine Learning Platform for AI, you can quickly establish and deploy machine learning experiments to achieve seamless integration between algorithms and your business. Machine Learning Platform for AI is built on the full-fledged algorithm application system of Alibaba Group, and is now serving tens of thousands of developers and enterprise users. You can quickly build services such as product recommendation, financial risk control, image identification, and voice recognition based on Machine Learning Platform for AI to implement artificial intelligence. Machine Learning Platform for AI also provides text processing components for NLP, including word splitting, deprecated word filtering, LDA, TF-IDF, and text summarization.

Elastic Compute Service

Elastic Compute Service is an online computing service that offers elastic and secure virtual cloud servers to cater all your cloud hosting needs. The gn6i, compute optimized type family with GPUs is suitable for AI (deep learning and machine learning) inference, computer vision, voice recognition, voice synthesization, natural language processing, machine translation, and reference systems.

1 0 0
Share on

Alibaba Clouder

1,915 posts | 415 followers

You may also like


5975685560745419 March 30, 2020 at 9:33 am

Great insights! Thanks for sharing useful information.