By Ren Xiaofeng from Amap Tech.
Travel is an important part of our daily life. We are accustomed to navigation tools, but do you know the underlying data and algorithms that are required to support a navigation app? Do you know how algorithms can improve and innovate the travel experience? In the recent Alibaba CIO Academy's live broadcast on technologies to fight the pandemic, Ren Xiaofeng, Chief Scientist at Amap, explained the evolution and innovation of the innovative algorithms behind Amap by looking at map production, search recommendations, route planning, and other advanced features.
Amap, formerly AutoNavi, is a map application by Alibaba Group and is a leading mapping service provider in China. It has become a popular tool for traveling in China, with more than 100 million daily active users and over 400 million monthly active users. Amap's services cover information services, driving navigation, shared travel, smart public transportation, smart attractions, cycling, walking, and long-distance travel. More than a tool for navigation, Amap is also an Internet infrastructure. Just as Taobao establishes relationships between consumers and products and Alipay establishes relationships between people and funds, Amap establishes relationships between people and places. Even more importantly, it establishes relationships between people and the real world. Amap's mission is to connect the world and make travel more wonderful. The Amap app is only the tip of the iceberg, the minority of our work that everyone can see. There is a great deal of work hidden from everyone's view, such as positioning, route planning, road condition detection, estimated time of arrival (ETA), online ride hailing, freight transportation, location-based service (LBS) games, and route planning based on supplies and demands. All of this requires a lot of technical support.
Map production is a very complicated process. As shown in the following figure, we must first collect information and then use algorithms to automatically recognize items of interest. Currently, fully automatic recognition is not implemented. Manual correction is still required before information is imported into the map service.
Map production algorithms: Map production algorithms can be roughly divided into two categories. The first is road sign recognition, with algorithms that recognize speed limit signs and no parking signs, as shown in the following figure. In addition to road information, recognition algorithms for point-of-interest (POI) signs are used to identify shops along the road and other points of interest. The implementation of these two types of algorithms must overcome many difficulties in real-world applications.
Road sign recognition: If we consider road sign recognition, as shown in the following figure, we first need to use a target detection algorithm to recognize road signs in images. Then, we must classify the detected signs, such as recognizing the do-not-enter sign in the following figure. Then, we must perform text recognition to identify the specific content of the sign, which indicates that it applies to long-distance buses between the period of 2:00 and 5:00. The road sign recognition process may seem simple, but we must overcome various difficulties to implement it.
Challenges in road sign recognition:
(1) There are many kinds of road signs in the real world. The signs shown in the following figure are not even all possible signs. The number of different road signs poses major challenges to the classification task.
(2) Poor image quality also increases the difficulty of road sign recognition. Image quality problems include distortion, reflection, occlusion, low resolution, and image compression. The influence of lighting conditions and weather can lead to very poor image quality.
The solution for image distortion is equivalent to the process of camera auto-calibration. This process must consider the internal parameters of the camera such as the focal length, center, and distortion, as well as unknown external parameters such as the position and angle. To ensure standard images, the calibration algorithm must be used to calibrate each camera, which is not feasible. At present, we can use multi-source image matching and cameras with better quality to solve the camera auto-calibration problem to a considerable extent.
We can use image quality enhancement methods to solve image quality problems. In the following figure, source images are on the left. You can zoom in on the source images to see some text, but some details are not clear. The images on the right show that the text can be made clearer through image enhancement. Image enhancement can improve the accuracy of recognition algorithms. It can also be used for fuzzy detection and to improve the efficiency of manual recognition.
(3) The need to detect small targets is a common problem in the field of image detection. In the following figure, you can probably guess that the highlighted object is a camera even though it is very small. However, if it is magnified, the resolution will be very poor, and the information contained in the small target itself is very limited. Peripheral information helps detect small targets. Therefore, we introduce the Attention mechanism and use a priori knowledge, such as the distribution, height, and size of cameras, to help solve the small-target detection problem.
(4) In real-world applications, the detection of small targets is more important than the detection of changes. For example, it is difficult for humans to tell if the cameras in the two images below are the same due to changes in time of day, weather, the camera used, and other factors. In contrast, algorithms can detect the position, associated lane, and camera structure type and perform scenario analysis to determine if the two images are of the same cameras.
Recognition algorithms for POI signs: In a real street environment, the signs of stores and other POIs along a road are very complex and dense, so POI sign recognition still faces many practical problems.
Challenges in POI sign identification:
(1) On actual streets, POIs mark their locations in a variety of ways, such as arches, nameplates, hanging signs, or lettering on the building. Many objects that look like POI signs are not, such as banners with slogans, billboards, slogans painted on walls, traffic signs, couplets, and license plates.
(2) After POI recognition, we must perform text recognition and extraction. For this process, in addition to the variety of POI sign types, there are also problems such as the uneven density of signs, their special shapes, and unclear and incomplete signs. These complex problems need to be solved by combining various technologies, such as multi-level cascade detection models, text detection and recognition, three-dimensional reconstruction position matching, and fuzzy and obstructed text detection technologies.
(3) In addition to text recognition, POI typesetting recognition also needs to analyze the face of the sign and understand the main store name, branch name, contact information, business scope, and non-POI text on the sign. Therefore, POI text recognition needs to recognize attributes first, integrate features according to text semantics, images, and locations, and then determine the context according to scene understanding and peripheral context information.
On the whole, to ensure and improve the accuracy of map data, the degree and efficiency of automated map production based on images are very important. In addition to concentrating on improving algorithms and multi-source data, Amap is constantly introducing new technologies. In the future, Amap hopes to put its algorithms on terminals to help us better understand road scenarios in real-time and collect map information in a faster and more accurate manner.
Our map app provides many types of search functions. First, users can enter a short description of the destination to allow the app to perform a more precise search on the map and return the result. Users can also search by category or brand by entering "dining" or a particular entertainment brand. During trips, the app also provides in-transit navigation search functions.
Features and challenges of map search: The following figure compares map searches with traditional e-commerce and webpage searches from a technical perspective. First, webpage searches scan long non-structured text on a very large scale, with hundreds of billions of possible results. E-commerce searches scan billions of products and return a list of images. In contrast, the input for map searches is a short string of structured text that describes POI, and the search must scan tens of millions of POIs. However, the accuracy of the map search must be relatively high and the map location information must be collected to support the search algorithm.
Evolution of Amap search technology: Amap started building a search system in 2010. Since 2014, we have introduced a series of high-end technologies to build a search expert system. We have fully integrated machine learning and deep learning to build a search mid-end and create an end-to-end business channel platform. We also provide support for Shenma, Cainiao, Banma, Ele.me, and other services.
Fuzzy search example: The following figure shows a search for "Hunan Provincial Human Resources and Social Security Hall" (湖南省人力资源社会保障厅). If the input statement is completely correct, the system can directly find the destination. However, input statements often contain mistakes, so the search algorithm needs to introduce geographical error correction for high and low frequency error correction, apply semantic rewriting through semantic matching, and establish a spatial model of the text through spatial relations. The search algorithm not only solves simple sentence matching but also identifies the intention of the search. This includes judging whether the search scope falls into the current region or an external region, whether an exact search or reverse search is required, whether the search is for real-time information or research, and whether the purpose of the trip is tourism or business.
Multi-source geographic information database: Amap hopes to build a comprehensive information database based on geographical locations. Here, the geographical location information includes locations, road networks, buildings, rooms, and other information. It also incorporates name, type, function, time, and comment information. This information is obtained from many sources, including image acquisition, text big data, search big data, trajectory big data, user input, and industry information. A major challenge in working with algorithms is finding ways to integrate multi-source information to build an accurate and rich comprehensive information database.
Evolution of Amap route planning: The following figure shows the evolution of Amap route planning. Amap has offered route planning services since 2004. With the continuous improvement and evolution of algorithms, Amap is currently studying multi-objective algorithms that can be used to meet users' needs and quickly plan better routes.
Path planning challenges: In real-world scenarios, route planning technology needs to solve the problem of ultra-large-scale real-time route planning. Specifically, route planning issues include the large scale of road networks. For example, there are more than 40 million roads in China. Second, road attributes change frequently, with about 10% of roads updated every quarter. Road condition information must be updated every few minutes.
As a practical problem, ultra-large-scale real-time route planning is very different from the shortest path algorithms studied in academia. An important method to improve the efficiency of standard algorithms is to introduce preprocessing, which can help solve the difficulties of large-scale and real-time route planning. There are many preprocessing algorithms, such as Arc Flags and Multi-Layers.
Some better algorithms proposed by the academic community for the ultra-large-scale shortest path problems are TNR, CH, and CBR. Therefore, we have to find a way to choose the algorithm that will deliver the best results in real-world scenarios. We need to strike a balance between algorithm performance and preprocessing performance based on actual requirements such as scale, real-time performance, and road condition updates.
To meet actual requirements, we must first ensure the performance of the planning algorithm. Real-time requirements need to be met, such as hourly road network structure updates and road condition updates every few minutes. In essence, the need for strong real-time performance promotes improvements in algorithm designs. The basic algorithms are generally layered. Multiple cells are divided for pre-calculation to find the shortest path. The cells are partitioned based on the road network structure, and their weights are updated. In terms of hardware, our system supports multi-core concurrency and cache optimization based on a large memory.
A massive volume of route information is produced in real life. We must find a way to mine the specific information that we need from spatiotemporal big data. Specifically, we need to mine information about POIs, new roads, accidents, and congestion.
Challenges of spatiotemporal big data mining:
In real-world applications, data mining must overcome many challenges, such as inaccurate trajectories, complex behaviors, and demanding timeliness requirements. Currently, through multi-source information integration and spatiotemporal models such as RNN, LSTM, CTC, TCN, and GCN, specific model improvements can provide support for feedback loops and data backflow.
Example of outdated POI mining: Outdated POI mining features spatial topology depiction and the integration of features from multiple sources. In this case, we can use the Wide & Deep model to process the static characteristics of POIs with a wide model and process time-series traffic characteristics with a deep network. Amap has also evolved a variety of models in this area. For example, we upgraded from a Recursive Neural Network (RNN) to DeepTCN to better mine POI expiration information.
Example of new road mining: To address the problem of new roads, in addition to designing algorithms for specific issues, fault tolerance is also required, such as trajectory features from different zooms, seed points obtained through mean shift trajectories, and principle curves. We can also use probabilistic models for short roads, connecting roads, and other specific scenarios. In addition, the CNN end-to-end model collects spatial description information from thermal and satellite images and depicts driving behavior through trajectory and road network connectivity.
High-precision maps are the foundation of autonomous driving. Autonomous driving systems impose demanding requirements for accuracy and need maps to correctly reflect the real world. This includes the complete and accurate depiction of roads, lanes, and surroundings. Amap requires position readings to be accurate to within 10 centimeters. After high-precision map data is collected, we need to solve the following problems:
Challenges in high-precision maps:
(1) Data alignment problem: Even given high-precision information and highly accurate vehicle positions, the error is still greater than 1 meter. By aligning the data collected from multiple sources, we can improve the precision to within 5 centimeters. We also need to deal with vegetation cover and other scenarios while maintaining the rigidity of the trajectory. For different upstream and downstream observation angles and point clouds, the frontend needs to match iterative closest points (ICPs), semantics, features, and shapes, and then conduct large-scale alignment and smoothing of the alignment information at the backend.
(2) Identification: Many objects on the road need to be identified, such as lane lines, road edges, markings on the road, and OBJ objects including poles, signs, and bridges. The recognition accuracy requires a recall rate of over 99%, and the positional accuracy must be within 10 centimeters. However, due to the frequent changes to scenarios, training samples are very limited. By integrating point clouds and image algorithms and combining deep learning with traditional algorithms, Amap reuses metric data and models in conventional maps to support data backflow and model iteration with a priori information.
(3) From highways to the city: The preceding problems are not very serious on highways, but in city driving, we must deal with a complex network of many different road types, including highways, main avenues, and small roads. Many different signs are complex and change frequently, with complex and variable scenarios, congestion, and blockages.
The problem of inaccurate positioning occurs frequently in urban canyons with tall buildings on all sides. In the figure below, the red dots indicate real locations, while the yellow dots indicate GPS locations. As you can see, there is a significant deviation between the two.
The visual positioning method can help solve inaccurate positioning. Currently, all mobile phones have cameras, as do an increasing number of cars. For us, we are considering how to use image and video technology to establish a common visual positioning method for vehicles and people, both indoors and outside. Specifically, there are several technical options to choose from. Simultaneous Localization and Mapping (SLAM) and Visual Inertial Odometry (VIO) are methods of locating relative positions. To locate absolute positions, the method of 3D reconstruction from sparse features plus Perspective-n-Point (PnP) can be used to obtain more accurate results. Another option is the vector map + real-time monitoring algorithm method, which is used to detect objects such as signs and lines. These algorithms require high computing power and accuracy. Amap is also exploring deep learning methods that can balance accuracy and robustness.
If we are given a picture, we want to know its location and angle. To do this, we need a reference image with a known position. Then, we can use real-time images to obtain efficient positioning performance, providing results in milliseconds throughout the road network and under any weather conditions.
How can the navigation experience be improved? In the real world, we will encounter many complicated scenes. Amap wants to provide a "what you see is what you get" (WYSIWYG) navigation experience. Traditional navigation systems output images resemble those shown on the left in the following figure. It would be better to use augmented reality (AR) to output a navigation image like the one shown on the right.
Amap released an AR navigation product in April 2019. The following figure shows the navigation view provided by our AR navigation system. The green route guides the car and indicates how it should turn. There are also detection and collision warning functions based on your distance from the car in front as well as a driving assistance safety function, which we are working to gradually improve.
Unlike current self-driving technology, AR navigation hopes to solve technical problems through simpler means. At present, Amap's single-camera sensor can provide better navigation performance while using only one-fifth of the computing power of a mobile phone chip.
AR navigation vehicle detection example: As shown in the following figure, the navigation system detects the vehicle in front of you. To efficiently detect vehicles in front of you, the model must be compressed, trained, and optimized by using an a priori scale. Real-time trajectory tracking can be performed based on detection.
Example of segmenting AR navigation lane lines: Lane line detection and vehicle detection follow the same principles. First, the model is compressed. Then, based on a multi-task model, multiple target detection algorithms are integrated to improve model reusability. Finally, we train and optimize the model, perform curve fitting, and integrate detection and tracking.
Example of the AR navigation guiding line: Based on the semantic segmentation and regression model, the guiding line in the AR navigation view is integrated with traditional GPS navigation to improve the performance of the model.
The following figure shows the actual display of the Amap AR navigation system. AR navigation can warn of pedestrians in real-time or remind you that a traffic light has changed.
Get to know our core technologies and latest product updates from Alibaba's top senior experts on our Tech Show series
amap_tech - October 29, 2020
amap_tech - November 20, 2019
amap_tech - August 27, 2020
amap_tech - August 27, 2020
amap_tech - December 4, 2019
Alibaba Cloud Security - December 12, 2019
Offline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.Learn More
A high-quality personalized recommendation service for your applications.Learn More
This solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.Learn More
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.Learn More
More Posts by Alibaba Clouder