Nowadays, artificial intelligence (AI) technology is widely applied and indispensable to various industries. The use of artificial intelligence (AI) to boost development is an issue that every enterprise is thinking about.
Current Artificial Intelligence (AI) technology is based on deep learning. To perform complex learning processes, deep learning requires two elements. First, it requires a large amount of data. Deep learning relies heavily on data mining to generate a large amount of effective training data. Second, deep learning requires optimization algorithms because it must find the best model for data matching in a very complex network.
Deep learning models are most commonly used in image vision, speech interaction, and natural language processing. Image vision includes image processing and understanding, natural person recognition, video encoding/decoding and content analysis, as well as three-dimensional (3D) vision. Speech interaction includes speech recognition, speech synthesis, and speech hardware technologies. Natural language processing (NLP) includes technologies such as natural language application, semantic understanding and computing, and basic translation computing. All these technologies are examples of Artificial Intelligence (AI) technology.
The development of Artificial Intelligence (AI) involves two key points. First, a large amount of "living" data can be used. There are many applications that use "living" data, such as Google's AlphaGo, which defeated the Go world champion in 2016. In addition, Artificial Intelligence (AI) technology has strong computing power. For example, Google's Waymo self-driving system can drive for long distances without human intervention. However, these same practices were from around 20 years ago. In 1995, the Backgammon program became the world champion in backgammon by playing 15,000 games with itself. In 1994, the self-driving car ALVINN traveled from the East Coast to the West Coast of the United States at a speed of 70 miles per hour. The development of AI over the past 20 years has been essentially driven by the exponential growth in data and computing power. Traditional Artificial Intelligence (AI) technology relied on multiple GPUs to achieve better modeling results.
Natural Language Processing also has a long history, starting back when it was called computational linguistics. Traditional computational linguistic methods used statistical language probability models to build natural language models. The following figure shows how the phrase "中国鼓励民营企业家投资国家基础建设" (China encourages private entrepreneurs to invest in national infrastructure) can be parsed into a parse tree and divided into subjects, predicates, objects, verbs, and nouns. In other words, the grammatical structure of this sentence can be expressed by the parse tree. In addition, the most common technology used in the traditional natural language field was the statistical language model. For example, the Pinyin string "ta shi yan jiu sheng wu de" in the following figure can be expressed in various Chinese character strings. A human who considers the phrase will likely conclude it means "he/she studies biology" (他是研究生物的). In fact, the human brain forms a concept chart through its massive reading experience and knows which expressions are possible. This is the process of forming a statistical language model. The most typical statistical language model is the Bi-gram model, which calculates the probabilities that different words follow a given word. However, traditional computational linguistic methods have disadvantages such as model inaccuracy and mediocre text processing performance.
Given the limitations of traditional methods, deep learning can be used for good effect in NLP. The most successful type of deep learning model in this field is the deep language model. It differs from traditional methods in that the context information of all words is represented by tensors. It also can use bidirectional expressions, which means it can predict both the future and the past. In addition, the deep language model uses the Transformer structure to better capture the relationships between words.
For a long time, speech synthesis technology has been viewed as an encoding technology that translates text into signals. Conversely, speech recognition is a decoding process.
Generally, speech recognition involves two models, a language model and an acoustic model. The main function of a language model is to predict the probability of a word or word sequence. An acoustic model predicts the probability of generating feature X based on the pronunciation of the word W.
The traditional hybrid speech recognition system is called GMM-HMM (Gaussian Mixture Model-Hidden Markov Model). The GMM is used as the acoustic model, and the HMM is used as the language model. Even though great efforts have been made in the field of speech recognition, machine speech recognition still cannot be compared to human speech recognition. After 2009, speech recognition systems based on deep learning began to develop. In 2017, Microsoft claimed that their speech recognition system showed significant improvements over traditional speech recognition systems and was even superior to human speech recognition.
Traditional hybrid speech recognition systems contain independently optimized acoustic models, language models, and linguist-designed pronunciation models. As you can clearly see, the construction process of traditional speech recognition systems is very complicated. It requires the parallel development of multiple components, with each model independently optimized, resulting in unsatisfactory overall optimization results.
Learning from the problems of traditional speech recognition systems, end-to-end speech recognition systems combine acoustic models, decoders, language models, and pronunciation models for unified development and optimization. This allows such systems to achieve optimal performance. Experimental results show that end-to-end speech recognition systems can further reduce the error rate during recognition by over 20%. In addition, the model size is significantly reduced to only several tenths that of a traditional speech recognition model. Also, end-to-end speech recognition systems can work in the cloud.
The most important part of vision technology is the recognition capabilities of image search. These capabilities have been developed over a long period of time. In the early 1990s, a search was based on the low-level features of global information, such as the distribution of image colour information. However, this method was not accurate. For example, the top five performers on the ImageNet database only achieved 30% accuracy. By the beginning of 2000, people began to search and recognize images basing on local feature encoding, and the search accuracy reached 70%. However, the local information had to be manually determined, so features not seen by the user could not be extracted. By around 2010, developers started to use deep learning to automatically extract local features, achieving an accuracy of 92%. This meant that the image search was ready for commercial use. The following image shows the history of the development of image search and recognition technology.
Currently, image search applications face three major challenges. First, they must deal with more and more data, with training data sets containing billions of samples. Second, image search must handle hundreds of millions of classifications. Third, model complexity is continuously increasing.
To solve these challenges, Alibaba launched Jiuding, a large-scale Artificial Intelligence (AI) training engine. Jiuding is a large-scale training vehicle and an expert system, covering vision, NLP, and other fields. Jiuding consists of two parts. The first part is communication. As all large-scale training requires multi-machine and multi-GPU architectures, finding ways to effectively improve the model training under such architectures and reduce the cost of communication is a very important area of research. The other part is the optimization algorithms. Implementing distributed optimization is also a major challenge. This large-scale training engine can classify large volumes of data and achieve ideal training results. For example, it can complete ImageNet ResNet50 training in 2.8 minutes. To process hundreds of millions of IDs, training for the classification of billions of images can be completed within seven days.
Image search is widely used in real-life scenarios. Currently, Pailitao can process ultra-large-scale image recognition and search tasks, including tasks that involve more than 400 million products, more than 3 billion images, and more than 20 million active users. It can identify more than 30 million entities, such as SKU commodities, animals, plants, vehicles, and more.
Tianxun is a recognition and analysis application for remote sensing images. It can carry out large-scale remote sensing image training, and process tasks such as road network extraction, terrain classification, new building identification, and illegal building identification.
Image segmentation is to segment objects from an image. The traditional image segmentation method is shown in the figure below. It divides an image into multiple segments based on the similarity between pixels. Areas with similar pixels are aggregated for output. However, traditional image segmentation technology cannot learn semantic information. It knows there is an object in the image, but does not know what the object is. In addition, because an unsupervised learning method is used, is it not good at handling corners.
In contrast, as shown in the figure below, segmentation technology based on deep learning uses supervised learning, which incorporates many training samples. Segmentation and classification results can be obtained at the same time, so the machine can understand the instance attributes of each pixel. When given a large volume of relevant data, the encoder and decoder model can finely partition the edges of the object.
Alibaba applies image segmentation technology to all categories of products in Taobao ecosystem. With this technology, Alibaba can automatically generate product images on a white background to accelerate product publishing.
In addition, this technology can also be used to mix and match clothing and apparel. When merchants provide model images, segmentation technology can be used to dress the models in different clothing.
In cloud computing, virtualization refers to hardware virtualization, which means to create virtual machines within an operating system (OS).
Deep learning is a machine learning technique that uses artificial neural networks consisting of multiple layers.
AI and ML are used interchangeably by a lot of people but AI is more like a superset of ML and is even a wider field in scope. Though it is a bit vague to define AI (boundaries) as it keeps on changing with the advancement of technology, in simple terms, AI can be defined as a science to make computers behave in a manner that brings them closer to human levels of intelligence and capability.
Machine Learning Platform for AI provides end-to-end machine learning services, including data processing, feature engineering, model training, model prediction, and model evaluation. Machine Learning Platform for AI combines all of these services to make AI more accessible than ever.
This solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.
Alibaba Clouder - April 1, 2021
Alibaba Clouder - April 23, 2018
Alibaba Clouder - September 29, 2017
Alibaba Clouder - April 28, 2021
- March 31, 2017
Alibaba Clouder - June 29, 2020
A high-quality personalized recommendation service for your applications.Learn More
Alibaba Cloud Intelligence Brain is an ultra-intelligent AI Platform for solving complex business and social problemsLearn More
This solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.Learn More
Log into an artificial intelligence for IT operations (AIOps) environment with an intelligent, all-in-one, and out-of-the-box log management solutionLearn More
More Posts by Alibaba Clouder