Recently, in the second Visual Dialog Challenge, the Alibaba AI team defeated the other nine participants (including Microsoft and Seoul National University) and won the championship.
(Alibaba AI won the Visual Dialog Challenge)
Launched by institutions including Georgia Institute of Technology and Facebook AI Research (FAIR) and the premier computer vision event CVPR, the Visual Dialog Challenge is currently the most authoritative visual dialog competition.
In this Challenge, an AI agent is required to answer any question related to any picture after the agent sees nearly 10,000 pictures. According to the competition results, Alibaba AI won the event with an accuracy rate of 74.57%, 16.82% higher than the previous winner. When the data set is the same, the accuracy of this task for humans is only 64.27%.
Traditional visual AI mainly is about detecting and recognizing targets, for example, detecting whether the target picture contains a cat. However, traditional visual AI is not good at understanding and deducing the logical relations between targets in complex scenarios. Therefore, traditional AI cannot answer complex questions (for example, what color are the clothes on the boy next to the cat?) and cannot convert picture information to language output that humans can understand.
Alibaba's AI breakthrough is in the recursive exploration dialog model, which integrates three major capabilities: image recognition, relation inference, and natural language comprehension. By efficiently using the annotation information, Alibaba AI learns and simulates the thinking mode of humans in complex scenarios. Therefore, Alibaba AI can efficiently recognize entities in a picture and the entity relationship, deduce the content of the picture, understand questions raised by humans and their real intentions through the context-based modeling and provide accurate answers.
(In the visual dialog, the AI bot on the left side can correctly answer questions from the human on the right side)
Visual dialog is a new AI research trend that has gained significant momentum over recent years. The objective of visual dialog is to teach machines to discuss visual content with humans by using a natural language. If we say visual recognition gives machines the ability of seeing, the visual dialog technology gives machines the ability to understand the visual world and make correct references. This indicates that the AI cognition level has improved to a new level.
(The visual dialog technology is expected to improve the earthquake rescue efficiency)
It is reported that this technology will be applied in many human-machine interaction scenarios. Search and rescue robots can search for survivors in ruins in a more timely and efficient manner based on directives and scenario information. Visually impaired people can understand the content in pictures and know their surroundings by using Alibaba AI. Autonomous vehicles can have a more accurate understanding of the intentions of impact factors and provide passengers a better experience.
Alibaba Clouder - October 9, 2019
Yongbin Li - September 17, 2020
Alibaba Clouder - July 10, 2018
Alibaba Clouder - April 20, 2020
Alibaba Clouder - June 10, 2020
Alibaba Clouder - March 23, 2020
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.Learn More
A high-quality personalized recommendation service for your applications.Learn More
This solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.Learn More
This technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.Learn More
More Posts by Alibaba Clouder