Community Blog Alibaba Discloses Future Plans for IoT and Intelligent Voice Systems

Alibaba Discloses Future Plans for IoT and Intelligent Voice Systems

Yan Zhijie, Senior Staff Algorithm Engineer of Alibaba Group recently delivered a keynote speech around the developments of AI, IoT, and voice-based intelligent systems at the AITech2018 Conference.

Alibaba Group has always been at the forefront of technology and is continually making breakthroughs in artificial intelligence (AI), Internet of Things (IoT), and voice-based intelligent systems. Yan Zhijie, Senior Staff Algorithm Engineer of Alibaba Group recently delivered a keynote speech around the developments of these areas at the AITech2018 Conference. Below is a transcript of his speech at the conference (translated and edited for readability).

Yan Zhijie's keynote speech at a conference

I am very happy to have the opportunity to talk with you about some of the work we are doing these days. Why am I talking about IoT at an event dedicated to artificial intelligence? First, Shenzhen is an extremely active city in the IoT industry. Second, it was also in Shenzhen's Yunqi Conference that Alibaba Cloud announced a new strategy. Alibaba is prepared to completely dive into the world of IoT, the field that will become our next main track, preceded by e-commerce, finance, logistics, and cloud computing. Our previous endeavors like TMall in e-commerce, Ant Financial in finance, Cainiao in logistics, and Alibaba Cloud in cloud computing have already shown the level at which the Alibaba Group sets its goals. So when we bring up such a high-level topic as IoT, it is evident that we see it as a significant avenue for our future growth.

Alibaba hopes to build basic IoT infrastructure, and then connect 10 billion devices through the cloud within five years. There are three points to our line of thinking: Cloud computing is the heart, AI is the brain, and IoT forms the nerves. So what do we mean by that? This suits Alibaba, especially considering the developmental path taken by Alibaba. Alibaba started with computing and storage, then traditional cloud computing, and continued to dive deep into AI technology in recent years. Whether it is computer vision technology, speech interaction technology, NLP technology, the machine learning on which it is all founded, or management strategies, the flow of Alibaba's development has always been trending toward finding ways to help AI reach more users through the nerves of AI, and then produce more data and with it more value.

Voice is the Most Natural Way to Interact with IoT

Since we are talking about connecting everything through IoT, we should first decide how people connect with their devices. As a practitioner of speech interaction, I can confidently say that speech is the most natural way for people to interact with IoT.

Why is that? This is because people interact in the same manner; the most natural way to converse is speech. We hope that people and machines will be able to interact in a similarly. It means, even if you are driving a car and your hands are busy, you still interact with your car without looking at the screen or having to touch the screen. With so many years of technological development, the results truly feel as though they're coming directly out of a sci-fi movie like Star Wars. The ability of humans to interact with robots with their voices is entirely a science fiction concept. However, between the times that the last sequel of star wars was shot to today, this technology has already become popular and easily realized.

Of course, the background to this achievement is technological progress and has benefited from the recent striding developments in AI, regarding both sensor and the perception technologies. Returning to speech interaction, today we have already crossed through the basic "usable" threshold and are moving toward "useful," building a technological bridge between human-computer interaction and personalized services.


To be more specific, please take a look at this picture. The portion on the right represents all of the content and services on the internet. It's just as dazzling as when I first joined Alibaba. Our group has been plowing this field for years, just like how we have Xiami Music in the music industry, for videos we have YouKu, Gaode for navigation, and Feizhu for travel. Naturally, the shopping industry doesn't necessarily have to be centered on TMall, much less Alipay for payments.

Addressing Customer Needs

With so much internet content and service, the next question we need to address is how to create better quality content and services to reach consumers. Its tentacles are the IoT end on the left-hand side. Whether it's through the more traditional cell phone or IoT devices like we have today, a smart speaker or our smart TVs, smart network cars, or even robots; we use these terminals to deliver our internet content and services to our consumers. The bridge in the middle and the media are natural interfaces between human and machine, whether it's through speech, computer vision, or some combination of the two. We are always working toward this big picture, developing intermediate technologies, and then constructing such a bridge.


The image above shows Alibaba's TMall Genie, an intelligent speaker similar to Alexa. This is a product of the artificial intelligence experiments conducted by the Alibaba team. During Double 11 of last year, we sold 1 million of these speakers, and we have accumulated 2 million sales to date, so using this terminal we are able to reach a large number of our users.


Alibaba is working together with SAIC group, a leader in China's automobile manufacturing industry, to invest in the establishment of a smart network for our cars. Aside from navigation, a very important capability is that when the user is driving the car and doesn't have a free hand, they can still operate the car's smart system with speech.

Additionally, the system features navigation, karaoke, a music player and even a feature that we have found to be particularly popular with users, the ability to play voice games while driving. One example is an idiom game. Some drivers and their passengers play this game the entire trip. We've brought new intelligence and fun to something that has always been very homogenous but has the potential to be the most significant IoT device – a car. Following SAIC's lead, more companies in the automotive industry have successively joined this big picture, including automakers such as Peugeot, Citroen, and Ford.


In this big picture, Ali launched its own OTT TV box called the Tmall Box. As I mentioned earlier, since we are talking about internet content and service, and TMall has such a good entertainment platform as YouKu, it only makes sense to provide a terminal to bring that content to the user with a TV box. The TV box is an IoT device capable of recognizing and passing speech commands to the TV. Since we all know that kids and seniors form the largest group of TV viewers, we need to think about how we can help them find what they want to watch, and voice is, of course, the most natural and perhaps the most convenient method.


In addition to producing our TV box, Alibaba has also set up a joint venture with the industry's leading TV manufacturer Haier. Together, we have packaged our content, services, and voice technology and loaded the box into the TV, just like we saw in Haier's AI TV. We are gradually transitioning from the previous generation of remote controllers to fifth-generation artificial intelligence television such as Haier-Ali's AI TV, which the company showcased in Shanghai Fair. Through long-range far-field voice interaction, a wake-free method, and a combination of voice recognition, we are capable of identifying the age of the user to filter inappropriate content for children, and gradually apply intelligent voice interaction to the home entertainment environment.


But that's not all. Speech interaction intelligence is entering the public service field. At the end of last year, Mr. Ma and the chairman of Shanghai Metro visited the concept of voice ticketing. Tourists and people on business trips to subway stations often use these Ticket machines. Such users are usually not familiar with Shanghai and only know their destination, not which line to take, where to transfer, or where to get off. It used to be almost necessary to go through Gaode, for example first check the route and then buy a ticket. However, with this voice ticket vending machine, the backend can be connected to the wealth of data on the internet, such as the data that powers Gaode, so that all you have to do is tell the machine where you want to go. The machine then plans the route for you, including transfers, the final stop, the length of the trip, and the cost of the ticket. Everything is available at a glance. Then, we can use Alipay to implement the entire ticket purchasing process.

While this process may sound simple, and indeed it should be, the technology also needs to provide solutions to issues like how to filter the noise from a busy subway station from the sound of the user's voice, especially at a distance, and still maintain high accuracy. We have made a green bar like on a jumbotron to act as a big array of microphones. This black thing in the center is an optical camera. In fact, it is the multi-modal voice interactive technology that makes it possible to interact with a machine long-distance via voice even in a particularly noisy environment. I believe that soon, service machines in such public places, especially a machine like this, will begin to make more and more appearances in our daily lives.


Technology Creates Business

Alibaba Cloud is very serious about layout technology. Even though we are a very successful company commercially, we also hope to grasp the core technology that surrounds the core of interaction. We hope that we will then be able to follow product innovation to produce sustainable iteration and keep improving. One example is that we start with the microphone's hardware, the sensor's hardware technology, and then move to a combination of microphones designed as a microphone array, then to signal-processing to the array, and then from the very beginning to the end, find the top experts in the world to create the technology.

For example, in the area of acoustic design, you may have all seen the report that we went to telephone companies to find experts in this field who wanted to work together with us to create an optimal end-to-end interactive experience. And this becomes more the case as people become more familiar with speech recognition, speech synthesis, and voiceprint technology. We also created a cloud+end-of-industry-level voice interactive system on Alibaba Cloud, with a corresponding customized platform, so that we can quickly adapt to a specific application area with this platform.


But we aren't doing the kind of research that's far and away above the rest. Our research focuses on how to convert technology into actual products, and then realizing the product's value. This is something that we spend a lot of time thinking about. At Alibaba, a team will often have someone who tends more towards research, one who is concerned with landing a product, and some who are more inclined towards engineering. When these people come together to form a team, the chemistry between them and the quality of the product they produce will be noticeably different.

Just now we could feel that from developing sensors to AI chips, we always start with research, whether that means researching on our own, or through mergers. We are very seriously pursuing a customer facing operating system called AliOS, something we have always called China's most advanced mobile platform.

We also have examples like SAIC and Haier that I just introduced to you, demonstrating how we develop IT together with leaders in respective industries. Just as I previously mentioned, we have already achieved relatively strong coverage on the cloud, and we spend a lot of time thinking about questions of commercial and market coverage like how we can use this basic infrastructure to connect devices and people and build commercially successful products. When you create a solid product, you still need to think about factors aside from the tech. Things like market capacity, production cost, price point, etc. Once you have the full suite of capabilities, there is an opportunity to polish these things to maximize the benefits.


The Road Ahead for the IoT

There are always arguments when it comes to IoT research and implementation. For example, in the field of smart homes, there is a lot of debate around whether we should have centralized or decentralized devices. Or the argument might be whether or not we should, at this stage, emphasize that IoT devices are portals to the internet, or merely as devices that complete simple tasks to satisfy customers' needs. Furthermore, as a commercial company, should we make proprietary hardware or use platforms and cooperate with leading hardware companies to accomplish the same?

Despite all these uncertainties, there are some things that Alibaba will definitely do. We want to dive deeper into all the technologies that we talked about earlier, and then share these technologies in a way that makes it available and accessible to the society. In the field of IoT, Alibaba seeks to build a smart, easy to manufacture, integrated IoT solution. This will include hardware modules and software. Then when you use this module and access your device, you can easily enjoy the content of the many services on the Internet that we just talked about, and then use the natural interaction model as a bridge and link to that content.

We will build some of our own benchmarking hardware, but this is just a means to an end. The real purpose is to build a foundational, open platform for society so that a variety of equipment can easily access it and reach our goal and vision of 10 billion users in five years.

Read similar articles and learn more about Alibaba Cloud's products and solutions at www.alibabacloud.com/blog.

1 1 1
Share on

Alibaba Clouder

2,600 posts | 750 followers

You may also like


Raja_KT March 6, 2019 at 3:54 am

AI-powered voice interaction and serviceable TMall Genie integration to BMW is fascinating. Also others like voice-powered ticketing system, TV are paving new ways for other developments and enhancements.