On January 2018, the International Consumer Electronics Show (CES) kicked off in Las Vegas, Nevada, featuring more than 4,000 exhibitors. CES is the world's largest consumer electronics show and the "SuperBowl" for global consumer electronics and consumer technology.
Industry giants such as Qualcomm, NVIDIA, Intel, LG, IBM, Baidu, took this opportunity to publicly reveal their latest and greatest AI chips, products, and strategies. AI related technologies and products were one of the hot topics at this year's show, with embedded AI products receiving the most widespread attention.
The current advanced AI development strategy is deep learning with a learning process divided into two parts: training and inference.
Training usually requires a significant amount of data input, or involves the use of unsupervised learning methods, such as enhanced learning, to create a complex deep neural network model. Due to the massive training data required and the complicated structures of a deep neural network, the training process requires a vast amount of computation and usually requires GPU clusters to train for several days or even weeks. For now, the GPU plays an irreplaceable role in the training phase.
On the other hand, "Inference" means to take an already well-trained model and, by use of new data, to "infer" a variety of conclusions. A prime example would be video surveillance equipment which runs a deep neural network model in the backend to determine whether the person associated with a face is on a blacklist. Although inference uses less computational power than training, it still involves a large number of matrix operations.
Currently, mainstream artificial intelligence chips include GPU, FPGA, ASIC, and chips that mimic the human brain's basic learning mechanics.
Field-programmable gate arrays (FPGAs) are application specific integrated circuits which integrate a large number of essential gate circuits with a memory chip. To achieve a particular function, we can define the connection between these gates and the memory by burning in an FPGA configuration file. This burned-in content is configurable.
By configuring specific files, the FPGA can be turned into a completely different processor. It is entirely like a whiteboard that could be written on and erased and written on again. FPGAs characteristically have low latency which makes them very suitable for using inference to support large user computing requests such as speech recognition—in real time.
Because FPGAs are suitable for low-latency performance of computationally intensive tasks, it certainly means that FPGA chips tend to gear towards high cloud concurrency with massive numbers of users. Compared to GPUs, the lower computational latency can provide a better consumer experience. The primary developers in this field include Intel, Amazon, Baidu, Microsoft and Alibaba Cloud.
Application specific integrated circuits (ASICs) are highly customized non-configurable specialized chips. Their most outstanding feature is that they require a lot of R&D investment. If you cannot guarantee the number of units required or the unit cost, it will be difficult to cut the production costs.
Furthermore, once the function of the chip gets set, it cannot be changed after the chip gets sliced. If the market for deep learning suddenly changes, the investments already made in a deep learning ASIC chips almost certainly will be unrecoverable. Therefore, ASICs carry higher market risk. However, as a dedicated chip, ASICs perform better than FPGAs. If you can achieve a high volume of chips, the sunk cost per chip will also be lower than an FPGA chip.
Google's Tensor Processing Unit (TPU) is an ASIC chip for accelerating deep learning. TPU gets installed on the AlphaGo system. However, Google's first generation TPU chips are only suitable for inference, and one cannot use it for training models. The recently released TPU2.0—the new generation of TPUs—can support inference and can also adequately support the training's deep network acceleration.
According to testing data disclosed by Google, one of their new large-scale translation models would have taken a full day to train on 32 of the best commercially available GPUs. It now trains to the same accuracy in just six hours using one-eighth of a TPU pod. (A TPU pod contains 64 second-generation TPUs).
GPU refers to Graphics Processing Unit. Originally a microprocessor used to run graphics operations on personal computers, workstations, game consoles, and some mobile devices; they were quickly able to process each pixel in an image. Later on, researchers in need of deep learning methodologies found that GPUs could parallelize massive amounts of data.
Therefore, GPUs were the first deep learning chips. In 2011, Professor Andrew Yan-Tak Ng took the lead by using GPUs when he founded and led the Google Brain Deep Learning Project with surprising results. Professors Ng's results show that 12 NVIDIA GPUs can provide the equivalent of 2,000 CPUs for deep learning performance. Researchers at NYU, the University of Toronto, and the Swiss AI Lab then accelerated their development of deep neural networks on GPUs.
The reason why technologists chose GPUs as supercomputer hardware is that the most demanding computational problems right now are perfect for parallel execution. A prime example is deep learning as it is currently the most advanced area of artificial intelligence based on neural networks. Neural networks are a vast network structure with extensive, complicated series of node connections.
Training a neural network to learn is very similar to how our brains learn. Neuronal connections must be built and strengthened from the ground up. However, it is possible to do this learning process in parallel. Therefore, it is feasible to accelerate with GPU hardware. This type of machine learning requires a lot of examples and can also be accelerated with parallel computing.
Neural network training on the GPU can be many times faster than on a CPU system. At present, NVIDIA occupies 70% of the GPU chip market in the world. Giants such as Google, Microsoft, and Amazon have also expanded the AI computing power of their data centers by purchasing NVIDIA GPU products. Alibaba Cloud also offers GPU processors for users who are interested in highly parallel processing.
Human brain-like architecture is a new type of microchip programming architecture that mimics the human brain. The chip functions as a brain synapse. The processor is similar to a neuron, and its communication system is similar to a nerve fiber thus allowing developers to design applications for human brain-like chips. Through this neural network system, the computer can sense, memorize and deal with a large number of different situations.
While the adoption of AI chips is going to demand new technologies and hardware, there are also many current technologies that need reformation. However, AI has a very bright future.
In this article, we discussed on different mainstream AI chips, their evolution and how they prove beneficial to enhance the present-day technology. Each of them is distinct from the other and have its advantages. However, they all have the potential to revolutionize technology.
Alibaba Clouder - September 29, 2017
Alibaba Clouder - February 5, 2018
Alibaba Clouder - March 25, 2019
Alibaba Clouder - January 8, 2019
Alibaba Clouder - March 22, 2018
Alibaba Clouder - September 20, 2018
An online computing service that offers elastic and secure virtual cloud servers to cater all your cloud hosting needs.Learn More
Powerful parallel computing capabilities based on GPU technology.Learn More
A HPCaaS cloud platform providing an all-in-one high-performance public computing serviceLearn More
More Posts by Alibaba Clouder