"Can" and "Can't" on the road to autonomous driving

One: Autopilot principle and technology big picture

Any technical field can be evaluated from two aspects: technical difficulty and challenge, market size and social impact. If the technical difficulty and challenges of a certain technology are high, but the market size and social impact are small, then it is not worth investing in. If the market size and social influence are large, but the technical difficulty and challenge are relatively small, the ability of technical personnel cannot be fully utilized.

Autonomous driving is a field with a very large market size and social impact, as well as high technical difficulties and challenges. From the perspective of market size and social impact, the world spends hundreds of millions of hours driving every day. If the driving time is saved and invested in other work, higher economic benefits will be obtained. In terms of technical difficulty, the automobile driving industry is already a relatively complex field in the civilian industry, and the difficulty of realizing driving automation and intelligence can be imagined.

1 Introduction to related concepts
As shown in the figure below, intelligent driving, automatic driving and unmanned driving are the relationship between technology advancement and scope reduction.


The car can complete all driving tasks in a limited environment or even in all environments without the intervention of the driver.


It means that at least some or all of the control functions (such as steering, accelerator, and brake) of the car with key safety components can be completed automatically without the driver's direct operation. Including unmanned driving and assisted driving.

smart driving

Including automatic driving and other assisted driving technologies, such as voice warning reminders, human-computer intelligent interaction, etc., which can assist or even replace the driver in a certain link to optimize the driving experience.

Autonomous Driving Grading Standards

The five-level automatic driving grading scheme proposed in the SAE (International Society of Automotive Engineers) J3016 document is currently a standard generally accepted and adopted by the field of automatic driving and the international community. The standard is divided into five levels.

L1 and L2 are called assisted driving. The main body and responsible party of the car driving is the driver, and the automatic driving system assists in taking part of the driving tasks. L1 To the extent applicable and practical, the automatic driving system can continue to perform a certain subtask of lateral (such as steering wheel) or vertical (such as accelerator, brake) vehicle motion control (not simultaneously). The autonomous driving system in L2 can simultaneously perform lateral or longitudinal vehicle motion control tasks.

L3 to L5 automatic driving systems can perform all dynamic driving tasks, and the main body and responsible party for vehicle driving is the automatic driving system. In L3, the driver participates in the driving task as a backup when needed. Up to L5, the automatic driving system undertakes all driving tasks, and the driver does not need to participate in driving.

In the above graded scenarios, L3 is still to be discussed. Whether there is a scenario where the eyes are liberated but the driver is required to be ready to participate in the driving task at any time. From a user perspective, whether the L3 scene can be user-friendly. For example, when a user is playing with a mobile phone in the car in an L3 level scenario, and a system emergency requires the user to take over the driving operation within 10 seconds. From a technical point of view, time is life in a traffic environment, and there is the possibility of accidents even in a short period of time. Requiring the system to make judgments and responses in 10 seconds or less may have exceeded the technical capabilities of the L3 level.

The above autonomous driving grading standards can be understood from another perspective, as shown in the figure below. L1 and L2 free the driver's hands and feet, without operating the steering wheel, accelerator, etc., and only need to monitor the driving scene. L3 frees the driver's eyes, but responds to system requirements when special needs arise. At L4 and L5, the driver does not need to participate in driving at all.

2 Different development routes of enterprises
According to the different business models and technical advantages of different companies, different companies have different positioning levels for developing autonomous driving technology, and adopt different overall technology development routes.

At present, car companies represented by Tesla mostly adopt a gradual development route, using L1 and L2 autonomous driving technologies to assist drivers to optimize the driving experience. Internet or high-tech companies such as Google are more engaged in research and development of autonomous driving technology in the direction of L4 and 5.

For technology companies such as the Internet, assisted driving technology has lower requirements for algorithms and is more hardware-based, creating low value. Therefore, Internet companies researching L4 autonomous driving technology can take advantage of more innovative technologies. Car companies pay attention to the hardware foundation, gradually start from L1 and L2, and gradually improve the level of autonomous driving capabilities, which is more in line with the development needs of enterprises.

In addition, the different choices of different companies in the technical route of sensor solutions and decision-making algorithm technology are the differences caused by the unsettled technology of subdivided fields, which are also based on the background of the overall development strategy. There are still many challenges in the development of driverless technology.

3 Autonomous driving technical principles and technical overview
The frameworks of different levels of autonomous driving technology are similar, and different functions have been developed due to different requirements for precision and functional coverage. The core of its technical framework is divided into three parts: environment perception, decision-making planning, and control execution, which are similar to the human driving process.

environmental awareness

Human drivers observe the environment through ears and eyes, and understand the position and status of themselves and traffic participants in the surrounding environment. The environment perception technology of autonomous driving technology obtains similar information through perception algorithms and sensors, including positioning and sensing the environment.

decision planning

After obtaining the environmental information, the driving path and other information are planned through decision-making algorithms and computing platforms, while ensuring safety.

control execution

Through the control algorithm and the wire control system, the vehicle is controlled to perform driving operations according to the planned path.

As shown in the figure above, the above three core technologies involve many modules.


Including control algorithm, positioning algorithm, perception algorithm and decision algorithm. As far as its maturity is concerned, the control algorithm can basically meet the technical requirements. As far as Alibaba's current practice is concerned, the positioning algorithm can meet the accuracy requirements in most cases. It is expected that the perception algorithm can accurately identify the category, position, movement speed, direction, etc. of objects in the surrounding environment. At present, there are still problems such as noise influence. Decision-making algorithms need to deal with issues such as noise and efficiently plan executable paths. As the bottleneck of autonomous driving technology, the perception algorithm and decision algorithm module need to be optimized.


Different sensor schemes can be selected according to different schemes and levels. For example, L2 technology uses more cameras and millimeter-wave radar, and L4 technology requires the use of lidar. LiDAR sensors also have many issues, such as stability issues. At present, mechanical lidar is mainly used. Although solid-state lidar is progressing rapidly, practice has proved that solid-state lidar cannot meet the stability requirements of autonomous driving technology.

computing platform

Both high capability and low power consumption are required. Since the upper layer algorithm has not been well defined, it is difficult to make or optimize a chip suitable for the algorithm.

Test method

Including real road test, simulation regression test. Simulation regression testing is a hot issue in the field of autonomous driving. There are many technical problems to be solved in terms of how to simulate the driving environment and the real behavior of the driver.

Two: The ability and inability of autonomous driving
As shown in the figure below, the current state of development of autonomous driving technology is summarized.

1 L1, L2
The assisted driving system has been commercialized and scaled up, such as Tesla. Assisted driving systems will be installed on more and more vehicles in the next few years.

L3 is controversial and will not be discussed for the time being.

2 L4
Divide autonomous driving technology L4 into two categories according to functional scenarios.

Medium and expressway public road L4

For example, shared taxis and self-driving logistics vehicles on expressways. As far as the algorithm is concerned, according to Waymo’s latest takeover data, every 1.3W miles driven by an autonomous vehicle requires a human driver to take over once, and this number can double about every two years. Human drivers take over about every 5W miles driven. It will take about 4 years for self-driving vehicles to reach the standard of human drivers from the perspective of MPI alone. In addition, even if the MPI meets the standard, it is necessary to consider whether the driving behavior of autonomous driving and whether the user-friendly experience can meet the standard.

Hardware development also takes time. For example, the computing platform needs to wait for the algorithm to form a certain standard before performing targeted optimization.

Another important point is whether laws and regulations allow self-driving vehicles to be on the road. The law does not allow risks, and breakthroughs can only be made on the basis of mature technology and the ability to pass a certain scale of verification.

To sum up the above points, it will take a long time to break through the L4 technology of medium and high-speed public roads. Its productization and scale have a long way to go.

Low speed end L4

Including the realization of automatic driving in parks, communities, campuses and other scenarios. In such scenarios, the vehicle can meet the demand by driving at a low speed, and can stop in time in case of sudden danger. Therefore, the requirements for algorithm accuracy can be reduced by orders of magnitude. Because its algorithm is not complicated, the hardware may not require custom chips, and it can be developed and optimized based on existing embedded computing platforms. At the same time, due to its low security risk, it is easier to get the support of laws and regulations. Therefore, the low-speed terminal L4 is expected to make a breakthrough in the near future.

Three: Alibaba’s progress and thinking on autonomous driving
1 Alibaba's autonomous driving mission
unmanned cargo

In terms of manned unmanned driving and cargo unmanned driving, Alibaba's autonomous driving is positioned as cargo unmanned driving. Empowering smart logistics, making logistics more convenient and efficient.

business perspective

Alibaba needs to choose enterprise platforms related to the business of the economy, such as Tmall, Taobao, Ele.me, Hema, Cainiao, etc., which generate 100 million+ parcels or takeaway orders every day, which requires very high manpower. Autonomous driving can take on logistics tasks to a great extent.

Technical point of view

Carrying unmanned driving is much easier to realize than manned unmanned driving technology. First of all, there is no need to consider user experience issues such as comfort for unmanned driving. Second, unmanned driving with loads reduces ethical and moral issues, and there is no need to consider whether the safety of users in the car or humans outside the car should be prioritized in the event of an accident.

Therefore, Alibaba positions itself as an unmanned vehicle for carrying goods, which not only meets the needs of logistics and ethics, but also makes technical implementation more feasible.

2 Terminal unmanned logistics
Since it is easier to break through problems in algorithms, hardware, laws and regulations, etc., it is hoped that it will be able to achieve commercial landing faster. At present, the terminal unmanned logistics "Cainiao logistics vehicle" has been deployed in many colleges and universities to realize normal operation and realize terminal distribution, which brings great convenience to users. It is hoped that this product can be deployed and popularized to more universities and communities. It is believed that terminal unmanned logistics will bring great changes to the market.

public road city distribution

Exploratory technology research and development stage.

3 Technical Layout

Including control algorithm, positioning algorithm, perception algorithm and decision algorithm. It has been described above.


Alibaba is working simultaneously on both sensors and computing platforms. The sensor needs to be customized and optimized. Even the relatively mature sensor: the camera, sometimes cannot meet the demand, for example, the current image of the driving environment at night is not up to standard. In terms of computing platforms, in order to realize the mass production and implementation of unmanned driving systems, the computing platforms must be embedded systems to ensure low power consumption and high stability. The development of embedded systems requires simultaneous optimization and acceleration of software and hardware. Therefore, it is not easy to use embedded systems and algorithms to realize L4-level autonomous driving technology. We hope to achieve a breakthrough this year.

data and infrastructure

Simultaneous optimization of simulation systems, high-precision maps, etc.

4 Algorithm Exploration
If the problem in the algorithm part cannot be solved, it will be difficult to carry out subsequent work such as computing platforms. AI algorithms have made tremendous progress over the past decade, but self-driving algorithms still have many problems that are difficult to deal with.

Scene diversity and complexity

Autonomous driving algorithms need to process traffic scenarios and plan optimal driving routes. However, the actual traffic scene is complex and diverse, which poses a major challenge to the autonomous driving algorithm.

For example, in a relatively simple single scene, the risk assessment of rear-end collision when other vehicles overtake the self-driving vehicle. The scene seems simple, but due to the different vehicle models, speeds, and trajectories, the overtaking behavior has completely different behaviors. If you want to use a set of algorithms to handle different behaviors in this scenario, the challenge is very difficult.

Handling Diverse Scenarios - No Free Lunch Theory

Dealing with diverse scenarios can start from two angles. First, develop super-excellent algorithms to solve problems, but the actual difficulties are huge and difficult to complete. Second, reduce the difficulty of the problem and use engineering thinking to solve the problem.

As shown in the figure below, the horizontal axis represents different problems, and the vertical axis represents the effect of the algorithm on solving the problem. The simple understanding of No Free Lunch theory means that it is difficult to use a general algorithm to solve different problems. Therefore, it is necessary to develop targeted algorithms for problems in different scenarios and optimize them accordingly, so as to optimally solve the problems.

Similarly, autonomous driving scenarios are complex and diverse, and it is very difficult to use a general algorithm to handle all scenarios. Therefore, it is necessary to refine and classify various scenes into multiple sub-scenes, and develop corresponding algorithms for each sub-scene. This way of thinking minimizes the difficulty of the problem and makes it easier to solve.

There is no uniform standard for scene classification

According to the above, scene classification is the first step to solve algorithm difficulties, but there is no unified standard yet. It is difficult to establish a unified classification standard for autonomous driving scenarios. As shown in the figure below, there are some classification standards in the industry, but they cannot meet Alibaba's requirements.

The classification based on the driving environment is too coarse-grained, and each sub-scene is still very complicated, making it difficult to start algorithm development. The classification based on scene elements is more biased towards the classification for testing, not for the classification of algorithm research and development. There is redundancy, and it is difficult to develop targeted algorithms for some sub-scenarios.

Alibaba Autonomous Driving Scenario Library

Based on the above problems, Alibaba proposed an autonomous driving scene library. As shown in the figure below, Alibaba's autonomous driving scene library has the following characteristics.

Highly refined, for example, the overtaking scene mentioned above is divided into 20+ categories in the scene library. In the case shown on the right side of the figure below, a car overtakes to the right lane urgently due to a roadblock, and the self-driving vehicle urgently slows down to avoid it. If scene classification is not performed, then the algorithm can only be quickly recognized and reacted to make judgments, which poses a great challenge to the algorithm. In many cases, the reaction may fail and human drivers need to take over the driving of the vehicle. After the scene is classified, it can be pre-judged that the vehicle on the left may overtake in this scene, and brake in advance to avoid the occurrence of an event that requires a human driver to take over the vehicle. This example also shows that scene classification has a positive impact on the autonomous driving algorithm.

Dynamic scene interaction is the most difficult part of autonomous driving technology. At present, the general scene library in the industry mostly relies on expert knowledge and manually designs classification standards, mainly static scenes (such as road types, vehicle models, and weather). Manually designed scene classification standards stay more at the semantic level, and it is difficult to deeply understand the dynamic behavior in autonomous driving scenes. Alibaba obtains challenging scene data by analyzing and clustering a large amount of road test data. Scene classification is carried out with a data-driven approach, forming a dynamic behavior-based autonomous driving scene library.

5 AutoDrive Platform – Efficiently Handling Fine Scenes
After building the scene library, it is necessary to develop a targeted algorithm for each scene. Assuming that there are 1K sub-scenarios, in extreme cases, 1K algorithms need to be developed specifically, and algorithm engineers are in great demand. The current general development model is artificial intelligence + intelligence. For example, the development of decision-making algorithms needs to rely heavily on the knowledge and experience of algorithm engineers, and the design of corresponding hyperparameters, network structures, rules, etc., is very inefficient.

The development of artificial intelligence is the process of gradually replacing manual design with computational methods. For example, before the advent of deep learning, features were manually designed (such as features that express the shape of an image), which was inefficient. Therefore, deep learning is used to replace manual design features through automatic learning and calculation of extracted features. In the field of autonomous driving, the application of computing methods to replace manual design is not widespread. For example, in decision-making and planning, the proportion of artificial design is still very large. On the one hand, it leads to large labor requirements and low development efficiency. On the other hand, manual design cannot achieve the optimal design.

Considering the above factors, Alibaba developed the AutoDrive platform. The platform uses computing to replace artificial intelligence, thereby improving the efficiency and quality of algorithm design. It is hoped that most of the work (such as designing hyperparameters, network structure, decision rules, etc.) can be automated through learning and searching. At present, the AutoDrive platform has made some progress, and automatic learning can be performed for each module in the automatic driving link.

Vision Application Case – NAS for 2D Recognition and Detection

Automate learning for perception algorithms with the AutoDrive platform. It can be seen from the structure comparison in the figure below that the manual design makes the system structure more complex. It is difficult for humans to understand events in high-dimensional space, so they will pile up many structures used in the industry, and increase the calculation accuracy by increasing the depth of the structure. Therefore, there are inevitably many redundancies in the artificial design structure. The AutoDrive platform can search for a simple structure as shown on the right side of the figure below, and this simplified structure can reach or even exceed the accuracy of a complex structure designed manually.

Such streamlined structures are important in the field of autonomous driving. The autonomous driving computing platform must be an embedded system and requires low power consumption. The method of automatic learning and searching through the AutoDrive platform can greatly reduce the computational complexity, thereby reducing the resource consumption of the computing platform and making the computing platform easier to design.


As shown in the figure below, in addition to the above-mentioned perception cases, the refined scene library of the AutoDrive platform is also applied to multiple algorithm modules such as decision-making planning and positioning in order to realize automated design.

For example, the intersection anti-collision strategy, after applying the AutoDrive platform to automatically learn and optimize its parameters, compared with manual design rules and parameters, the effect is improved by 16.5%.

In order to make the autonomous driving problem easier, it is classified into scenarios. After dividing the Cut-in scene into 25 categories for targeted solution, compared with using a set of general algorithms to solve difficulties, the effect is significantly improved by 18.7%. At the same time, the advantages of the refined scene library are explained.

The autonomous driving cloud platform behind AutoDrive

AutoDrive is not widely used in the field of autonomous driving because of its high requirements for the engineering platform behind it. As far as the similar platform AutoML is concerned, AutoDrive is still very different from the AutoML platform. AutoML mainly deals with two-dimensional (such as images, text) information or data, so its data input and output are relatively simple, and it only needs to verify whether its classification results are correct. AutoDrive, on the other hand, needs to process complex, multimodal, time-series autonomous driving signals (such as video), and simulate the final output results on such signals. For example, to verify the decision-making plan, it is necessary to simulate the vehicle trajectory after a certain parameter is changed. Therefore, AutoDrive is not only more difficult, but also increases the engineering difficulty.

Therefore, behind AutoDrive, it is necessary to rely on large-scale engineering systems including simulation, data labeling, data management, model training, computing resources, and computing platforms to support the development of AutoDrive.

6 Chemical reaction of autonomous driving algorithm development based on "No Free Lunch" theory
Aiming at the algorithm bottleneck problem of autonomous driving technology, Alibaba proposes to combine the following three methods to greatly promote the progress of algorithm research and development.

scene refinement

Decompose complex scene problems and simplify solutions.

Algorithm targeting

Carry out targeted algorithm development and targeted optimization for classified sub-scenes to improve the success rate of each scene.

Efficient cloud platform

The AutoDrive platform is proposed to reduce manual design and manual participation by replacing computing with intelligence, and reduce reliance on expert knowledge in the field of autonomous driving, thereby improving R&D efficiency and quality.

Four: Summary and Outlook
1) The automated learning platform represented by AutoDrive will play an increasingly important role in the research and development of autonomous driving.

2) With the optimization and improvement of algorithms, software and hardware co-design will receive more and more attention.

3) From the perspective of landing, in addition to the continued popularization of L2 assisted driving, unmanned vehicles on low-speed non-public roads are expected to gradually realize productization and scale in the near future.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us