Community Blog Application of Machine Learning in Traffic Sign Detection and Fine-grained Classification

Application of Machine Learning in Traffic Sign Detection and Fine-grained Classification

This article describes the application of machine learning in map data generation and traffic sign recognition for AMAP.

In today's digitally connected world, maps are an essential part of everyday life, and data is indispensable for map services. Dynamic map services provide users direct and explicit access to smart features supported by a large amount of data.

In the nascent stages of map services technology, data was collected from a range of specialties via tools such as vehicles, bicycles, airplanes, and satellite images. In the last two years, map data is being collected through crowdsourcing via intelligent hardware. The collected data is then updated at unprecedented speed and with unparalleled accuracy. Rapid changes on the ground are driving users to increasingly depend on map services apps. With the growing demand for map services, map companies focusing on superior user experience prioritize the need for speed and accuracy of updating data as the key objective. The first step in the direction of effective data update is traffic sign detection.

This article describes the application of machine learning in map data generation for AMAP. The technical solutions and designs outlined below are verified and have accomplished great results. Additionally, these solutions ensure a basic technical guarantee for the rapid update of AMAP data.

What Is Traffic Sign Detection?

It is the process of automatically detecting traffic signs, such as speed limit signs, U-turn prohibition signs, crosswalk signs, and electronic eyes, in images of street scenes. These detection results are delivered to a production process for generating map data for map services users

Challenges in Traffic Sign Detection

The key challenges of traffic sign detection include the complexity of the forms of traffic signs and the susceptibility to the natural environment during the shooting process. Traffic sign detection has strict requirements for algorithm performance to achieve fast data updates and high data accuracy. Let's deep dive to understand specific challenges:

1. Vast Variations in Sample Forms

Traffic signs vary hugely in the following aspects:

  • Diverse Types: Traffic signs are classified into hundreds of categories according to national standards.
  • Diverse Forms: The common forms of traffic signs include triangle, circle, square, diamond, and octagon. The physical facilities of traffic signs include ground markings, electronic eyes, traffic lights, height limit poles, and fences.
  • Extensive Color Distribution: The common colors of traffic signs include yellow, red, blue, green, black, and white.
  • Huge Image Size Differences: The sizes of traffic sign images range from hundreds of pixels (such as square plates and crosswalk signs) to dozens of pixels (such as electronic eyes).

Figure 1: Common Traffic Signs (Signage)

2. Susceptibility to Changes under Natural Conditions

Traffic signs may either be obstructed by vehicles or trees or worn out under natural conditions. Secondly, image collection may be affected by the weather or the season, resulting in blurred images and color distortion.

Figure 2: Traffic Signs Captured Under Natural Conditions

The algorithm accuracy gets severely impaired by the misidentification of signs that resemble traffic signs, such as business placards and public welfare billboards.

Figure 3: Examples of signs that resemble traffic signs and generate noise during traffic sign detection

3. Performance Requirements

The following are the requirements for algorithm performance to achieve fast data updates and high data accuracy:

  • Accurate Recall Rate: AMAP poses high requirements for the recall rate and accuracy in various scenarios. While failed recall may delay data updates, the incorrect recall may affect the overall efficiency and operation cycle of the app, thereby impacting the real-time data updates.
  • Throughput: AMAP processes hundreds of millions of images every day and thus, requires efficient algorithms that process data promptly to ensure timely map data updates.
  • Extensibility: Different types of traffic signs reflect the unique features of various countries or regions, and may also change based on modifications to national standards over time. Hence, this requires highly extensible algorithms that quickly adapt to new traffic signs.

Traffic Sign Detection Solution for AMAP

The academic circle trains deep learning models for target detection, specifically for the end-to-end mode, to achieve the globally optimal detection results. The end-to-end mode is easy to use, as it simply requires annotating samples of hundreds of objects and putting them in the deep learning framework for iterative training to obtain the final model.

The end-to-end mode is divided into two-stage methods (Faster R-CNN[1]) and one-stage methods (YOLO[2], SSD[3]). It is critical to make note of the followings during actual application:

  • High Sample Annotation Cost: All training samples need to be annotated based on all categories. Therefore, at the time of the new category addition, the historical training samples must be fully annotated, which is extremely costly.
  • No Support for Single-category Iteration: Traffic signs occur at varying frequencies, and some traffic signs take precedence over others. Therefore, AMAP requires a highly accurate recall rate for certain traffic sign categories, such as electronic eyes and speed limit signs. However, the end-to-end mode requires full-category iteration but does not support single-category iteration. This increases the costs of algorithm iteration and testing.
  • Complex Model Training: It is required to process traffic signs that come in hundreds of categories and occur at hugely varying frequencies. However, massive categorization, low convergence, and imbalance between the recall rate and accuracy make it complex to train a single model of target detection.

Considering the development of common target detection technologies and the traffic sign detection requirement posed by AMAP, we select Faster R-CNN as the basic detection framework for its better detection results (especially for small targets) and independent region proposal network (RPN), which can meet extensibility requirements. In terms of speed, we also implemented targeted optimization and adjustment.

Figure 4: Target detection and fine-grained classification for traffic sign detection

For actual application, we divide the detection framework into the following two phases:

1. Target Detection

It is the process of detecting all traffic signs in captured pictures through Faster R-CNN and thereby classifying the traffic signs in a coarse-grained manner at a higher recall rate and execution speed.

In practice, the following policies are adopted to improve algorithm capability:

  • Results: Detection targets are classified into N categories, such as circle, triangle, square, and crosswalk sign with an abnormal aspect ratio. An exclusive RPN is configured for each category and designed with the ratio and scale of anchor based on the corresponding dimensions.
    Feature graphs at different layers are applied to different RPNs as needed to make the design more targeted. A variety of sample enhancement methods are used to address the uneven distribution of various types of samples. Sample distribution is further adjusted by using OHEM and other methods during training. The detection result is further improved by using IoU-Net, Soft-NMS, and other solutions.
  • Performance: Various categories share the basic convolution layer to ensure that the detection time does not rise excessively.
  • Extensibility: Under ideal conditions, only one RPN needs to be added to independently iterate a single new category, without affecting the results of other categories. As shown in Figure 5, RPN 1 and RPN 2 are independent of each other.

Figure 5: Schematic Drawing of a Multi-RPN Design

2. Fine-grained Classification

It is defined as the process of classifying the candidate frames that are acquired during the target detection phase in a fine-grained manner, by filtering the noise to ensure a high recall rate and accuracy. In actual implementation, the following policies are also used to improve the results:

  • An independent fine-grained network is configured for each category without interfering with other categories. Each category is iterated independently and in parallel with other categories. This enables concurrent R&D by multiple people and effectively shortens the R&D cycle.
  • Networks with varying computing complexity are selected for fine-grained classification and noise suppression based on the sophistication of specific categories. This avoids efficiency bottlenecks rising due to highly complex categories.
  • Samples are independently collected for each category and can also be collected and annotated for specific categories. This greatly improves the efficiency of building training and test sets. As shown in Figure 6, a simple network can be used to process circular traffic signs, as they are easily distinguished from other types of traffic signs. For square traffic signs, positive and negative samples must be distinguished based on the layout and content of the text. Therefore, a deeper network must be used for classifying such samples.

Figure 6: Modular Schematic Drawing of Fine-grained Classification

Fine-grained classification uses multiple models that increase the video RAM usage of the server and poses additional requirements for computing resources. To address these concerns, we optimize the deep learning framework by dynamically allocating and sharing temporary buffers among models, and then cropping the reverse propagation function of the framework. Such measures reduce video RAM usage by more than 50%.

Results and Benefits

The solution illustrated in the preceding section is officially launched. The accuracy of the recall rate meets the production requirements, and the average daily image throughput is more than 10 million. Figure 7 shows some results of the solution (different boxes indicate different detection results).

Figure 7 Results of Traffic Sign Detection


Traffic sign detection technology applied to AMAP helps to effectively improve the data production efficiency of AMAP and achieve the goal of updating map data at an approximate speed of T+0 (zero time difference).

Currently, we also use machine learning technology for automatic data production and further narrowing down the differences between the real world and map data so as to "connect the real world and make travel better."

0 0 0
Share on


15 posts | 2 followers

You may also like



15 posts | 2 followers

Related Products