In-depth summary | The security difficulties of machine intelligence

Throughout our lives, we are all looking for our own ultimate answers. All the time, we all hope to be one step closer to the truth in the field we are engaged in, even if it is a small step. Confrontation of threats, research and judgment of intelligence, analysis of data, understanding of knowledge, pursuit of intelligence, exploration of the heart, insight into human nature, epiphany of life, cognition of the world, exploration of the universe , in the final analysis is the desire for the truth.

What is the truth about security? In the past, security was a confrontation between people, but the current security situation is that attackers plus attacking machines confront defenders plus defensive machines. The final outcome of future security must be the autonomous confrontation between machines. From this perspective, the essence of security is actually the knowledge confrontation between agents. Intelligence can be carbon-based intelligence or silicon-based intelligence. It doesn’t matter at all whether we have “cleared security” until the end of the day. What matters is that the machine intelligence we personally forged, as an extension of human intelligence, is destined to be one step ahead of us and reach security in advance. the truth.

General Technology and Human Development
Technology is an extension of human ability, and inventing technology is the greatest talent of human beings. Long before the emergence of modern Homo sapiens, early hominids invented various technologies that gave them an advantage in biological competition with other animals. In the course of human history, the core driving force behind the successive leaps in productivity and economic level is the invention of general purpose technologies (GPTs General Purpose Technologies) generation after generation. General-purpose technology completely affects the development process of human beings through its influence on the existing economic and social structures.

General-purpose technology is a single identifiable basic common technology. So far, there are only about 20 technologies in human history that can be classified as general-purpose technologies. These technologies have the following characteristics:

"Ubiquitous" general-purpose technology has a variety of uses and a large number of wide-ranging use scenarios;
"Continuous improvement" As time goes by, the general technology is constantly improving and the cost of use is constantly decreasing;
"Drive innovation" general-purpose technology makes technological innovation and technological invention easier and leads to more new products.

From the agricultural revolution in the Neolithic Age, the domestication technology of animals and plants, and writing technology, to the first industrial revolution in the 18th and 19th centuries, steam engine technology, factory systems, railway systems..., to the second industrial revolution, internal combustion engine technology, Electricity technology, automobile technology, aircraft technology..., and then to the information revolution in the 20th century, computers, Internet, biotechnology, etc. ** The interval between inventions of general-purpose technologies is getting shorter and shorter, the intensity is getting higher and higher, the scope of influence is getting bigger and bigger, and productivity is improving faster and faster.
**
The synergistic effect produced by the technical connection between various general-purpose technologies in the same era has played a superimposed role in promoting productivity, economic development, and promoting innovation. In the age of steam, steam engines provide power energy, and railway networks connect various physical spaces to transport steel and other materials, which are applied to various machine systems. In the electrical age, the central power station provides electrical energy, and the power grid connects various physical spaces to transmit current, which is applied to various electrical systems.

In the information age, personal computers (or servers) provide computing power, Internet connections transmit data, and connect information systems in various digital spaces. In the intelligent age, general-purpose computing (cloud, edge and other computing forms) provides computing power, and the boundary between physical space and digital space will become more and more blurred to form a fusion space, and the Internet of Everything connects various intelligent systems in the fusion space. In different eras, there are similar ways of synergy between common technologies. The age of steam provides kinetic energy to machines, the age of electricity provides electrical energy to machines, the age of information provides data to machines, and the age of intelligence provides knowledge to machines.

The Historical Development of Machine Intelligence
Among all general-purpose technologies, machine intelligence is the most special kind of general-purpose technology. This is the first time that humans have invented a technology that allows machines to acquire knowledge independently, and it is also the first time that humans have the ability to create non-carbon-based intelligent bodies.

On a cold afternoon in February 1882, the young Nikola Tesla completed his idea of an alternator that had troubled him for five years, and exclaimed ecstatically, "From then on, human beings are no longer slaves to heavy physical labor. machines will liberate them, and so will the whole world".

In 1936, in order to prove that there are undecidable propositions in mathematics, 24-year-old Alan Turing proposed the idea of "Turing Machine". In 1948, he described most of the content of connectionism in the paper "INTELLIGENT MACHINERY", followed by Published "COMPUTING MACHINERY AND INTELLIGENCE" in 1999, and proposed the famous "Turing Test". In the same year, Marvin Minsky and his classmate Dunn Edmund built the world's first neural network computer.

In 1955, von Neumann accepted the invitation of Silliman Lecture at Yale University, and the content of the lecture was later compiled into a book "THE COMPUTER AND THE BRAIN". In 1956, John McCarthy first proposed the concept of "Aritificial Intelligence" at the Darmouth College Summer Symposium. So far, the prelude to the history of machine intelligence has officially opened, and three schools of Symbolism, Connectionism, and Actionism have been formed one after another.

Since the development of machine intelligence, it has experienced several waves and cold winters, and the three major principles have also experienced their own ups and downs. Since the 1950s, symbolism represented by expert systems and classical machine learning has dominated for a long time. In contrast, connectionism has experienced twists and turns, from the proposal of the perceptron to the publication of backpropagation in the 1980s, to the success of deep learning with computing power and data, and to the three giants Geoffrey Hinton, Yann LeCun, and Yoshua in 2018. Bengio won the Turing Award before finally getting hot. Behaviorism represented by reinforcement learning has attracted a lot of attention after the birth of AlphaGo and AlphaZero in 2016, and it is also known as the only way to general machine intelligence.

The evolution of human intelligence has gone through millions of years, and the evolution of machine intelligence has only been more than 60 years so far. Although general machine intelligence is still far away, today machine intelligence has gradually surpassed human intelligence in many fields. Over the past six decades, data computing capabilities, data storage capabilities, and data transmission capabilities have all increased by at least 10 million times. At the same time, the growth rate of data resources far exceeds the growth rate of Moore's Law. It is estimated that by 2020, the total amount of global data will reach 40ZB. Today, machine intelligence has reached a key point in the explosion of general-purpose technologies. At the same time, under the synergy of other general-purpose technologies, the changes triggered by general-purpose technologies this time will be more dramatic than ever before.

Data-driven to intelligent-driven
"Business intelligence and smart business", "safety intelligence and smart security"... There are many words like this. The core difference between the two is that the former is single-point intelligence, the latter is global intelligence, and the former is based on data. drive, while the latter is based on intelligent drive. "Data-driven" and "intelligence-driven" seem similar but have fundamental differences. The most essential difference is the difference in the decision-making bodies behind them. "Data-driven" ultimately relies on humans to make decisions. Data only provides auxiliary judgment information that can make better decisions, while "smart-driven" means that machines replace humans to directly make online decisions.

The human brain has been subject to cognitive biases as a result of the evolution of life. Limited by the information transmission bandwidth and information processing speed of the human brain, since the early hunter stage, humans have gradually formed a reasoning and decision-making system based on simple heuristics, avoiding the high cost of processing a large amount of information. This allows humans to make quick and almost unconscious decisions in various dangerous situations, and civilization has survived to this day. However, quick and almost mindless decisions do not always mean optimal or even accurate decisions.

Heuristics, inherited, become preloaded cognitive biases etched into our brains, and these “biases” influence human decision-making in ways that deviate from rational objectivity. Until the advent of the "data-driven" era, abundant and massive online data provided a basis for auxiliary judgments for better decision-making. We use general-purpose computing and massive data processing technology to reduce the amount of data to the digestible range of the human brain, and use it for auxiliary decision-making in various application scenarios.

"Data-driven" has incomparable advantages over the previous "intuition-driven" or "experience-driven", but humans still play the role of the "central processing unit" decision-making body, which still has limitations. The throughput limit of the human brain processor cannot handle the full amount of raw data. It can only turn the full amount of data resources into "summary data" or "summary data", and then extract knowledge from it. This process is destined to be accompanied by the loss of information, which will lose some of the hidden relationships, data patterns and insights behind the data in the full amount of data.

"Intelligence-driven" is to allow machine intelligence to directly make online decisions. Whether it is decision-making efficiency, scale, objectivity, or evolutionary growth speed, it is incomparable to "data-driven". "Smart drive" is to directly extract the full amount of knowledge from the full amount of data resources, and then use the full amount of knowledge to directly make global decisions. "Data-driven" is essentially aggregated data plus human intelligence, while "intelligence-driven" is essentially full data plus machine intelligence.

However, the reality is that in business scenarios, a large number of our decisions are not even "data-driven", let alone "intelligent-driven". The realization of "perception" by machine intelligence is only the first step, and the realization of "decision-making" is a more critical step. The current stage of machine intelligence is just like Churchill's words "Now this is not the end, it is not even the beginning of the end .But it is perhaps the end of beginning". So, what exactly is a real machine intelligence system?

Core Paradigms of Intelligent Systems
For a truly intelligent system, the core paradigm of an instance must have the following components: perception system, cognition system, decision-making system, and action system. At the same time, an instance of an intelligent system must be inseparable from the interaction with the environment. In the past, we always emphasized and paid too much attention to the inner nature of the system, but it was easy to ignore the role of interaction with the environment.

The function of the perception system is to observe and precipitate the environment, and the output is data. The generation of all data comes from the observation and precipitation of the environment. The motivation behind the observation and precipitation is our desire to measure, record, and analyze the world. Information exists in the environment (digital space or physical space) all the time. In different scenarios, we use hardware, software, and algorithms to "digitize" it. Hardware such as sensors, cameras, etc., software such as log recorders, data collectors, etc., algorithms such as various intelligent vision algorithms, intelligent voice algorithms, etc. One day, we will be able to digitize all physical spaces and completely map physical spaces to data spaces.

The role of the cognitive system is to induce and summarize data and extract knowledge. The knowledge that humans understand must be expressed in natural language, while for machines, it is trained with data sets that can represent the problem space, and then uses the trained "model" to reason in the new data space. As long as it can solve a specific target task, no matter its form is vector, map or natural language, it is actually knowledge, and the expression of feature space itself is a kind of knowledge.

The role of the decision-making system is to plan and make decisions on target tasks, and generate strategies for target tasks. The action system executes specific actions according to the strategy, interacts with the environment, and has an impact on the environment. Actions act on the environment to form feedback, and the feedback promotes the perception system to perceive more data, thereby continuously acquiring more knowledge, making better decisions on target tasks, and forming a closed-loop continuous iterative evolution.

From this point of view, the essence of machine intelligence is an autonomous machine that observes the accumulated data of the environment, summarizes the data to extract knowledge, plans the online decision-making of the target, and takes actions to affect the environment. Machine intelligence is an autonomous machine, and the biggest difference between an autonomous machine and past automated machines is whether it can acquire knowledge to solve target tasks autonomously.

Individual Intelligence to Swarm Intelligence
Most of today's intelligent systems are single instances of isolated distribution of intelligence, and the corresponding solutions are also isolated distribution of single problems. The essence of cloud computing is "computing online", the essence of big data is "data online", and machine intelligence eventually needs to realize the realization of intelligent online, so that intelligent instances can interact independently online.

A single intelligent instance is an autonomous system composed of a system of "perception-cognition-decision-action", has its own representation of the world, and can autonomously complete its own goals and tasks. In the same dynamic and complex game environment, instances are connected online through interconnection and interact with each other. They can cooperate, compete, compete and coexist, or neither cooperate nor compete. The policy change of an instance will not only affect its own environment, but also affect the policy changes of other instances.

For multiple intelligent instances of cooperation, you can choose to share data, knowledge, strategies or actions, coordinate and cooperate to complete more complex target tasks, and jointly form a higher-level intelligent instance. When the coverage density of intelligent instances in a unit space is large enough, individual intelligence begins to evolve into group intelligence.

The Four Quadrants of Intelligence and Security
Security is the most special of all technologies, and it may not even be called "security" as a technology in the strict sense. Security has been accompanied by various human activities long before any technology was invented by human beings. So far, no technology is unique to the security field or grows from the security field, but security has always been accompanied by other technologies and complemented each other.

There are four ways to combine any general-purpose technology with security. Machine intelligence technology is no exception. Vertically, it is "safety for intelligence" and "intelligence for security", and horizontally, it is "attack perspective" and "defense perspective". Bringing security to intelligence means that machine intelligence technology itself will bring new security issues, one is the security issues caused by the vulnerability of machine intelligence itself, and the other is the security issues derived from surrounding scenarios caused by machine intelligence. Giving security intelligence refers to applying machine intelligence to security scenarios. Attackers use machine intelligence to empower attacks, and defenders use machine intelligence to empower defense.

In these four quadrants, the timing and maturity of the intersection of new technologies and security are different. Attackers have stronger motivations and interests than defenders, so attack-related quadrants are usually easier to explore and accept new technologies. Defenders are always lagging behind, and they are always prone to indulge in the false sense of security created by old technologies and artificial experience, causing the fourth quadrant to always be the one with the most lagging and slowest development. Of course, this is also directly related to the attributes and difficulties of the defensive perspective itself.

The Security Dilemma of Machine Intelligence
Go is a simple complex game, and security is a complex simple game. In 1994, cognitive scientist Steven Pinker wrote in "The Language Instinct" that "for machine intelligence, difficult problems are easy to solve, and simple problems are difficult to solve." "Simple complex problem" means that the problem space is closed, but the problem itself has a relatively high complexity. "Complex simple problem" means that the problem space is infinitely open, but the problem itself is not No high complexity. Today, machine intelligence technology is often better than humans in the field of "simple and complex problems", but for "complex and simple problems", machine intelligence often fails due to the disaster of dimensionality caused by the generalization limit.

Security is a typical "complex simple problem", and Moravec's paradox is more obvious in the field of security. High uncertainty is the biggest feature of security, and the biggest dilemma of security itself is how to deal with the "unknown unknown". In many cases, we rush forward and say that we need to use machine intelligence to solve the problem without clearly defining the problem. This is the main reason why most machine intelligence fails in the security field. Today, in the field of security, there is little need to break through the ceiling of smart technology. Instead, what needs to be solved is "clearly defined problems", that is, how to close the problem space.

The safe problem space is usually unbounded, and the sample space of positive and negative samples corresponding to the problem space is seriously asymmetric. The serious lack of negative data (such as attack data, risk data, etc.) caused by "unknown unknowns" leads to the asymmetry of the feature space, which in turn makes the feature space unable to truly represent the problem space. A "model" is a hypothesis about the world in an existing data space and used to reason about it in a new data space. Today's machine intelligence technology has been able to solve the nonlinear complex relationship between input and output, but it is still relatively weak for the huge gap between the sample space and the problem space.

In the 1960s, the Bell-La Padula safety model (Bell-La Padula) stated that "a system is safe only if it starts in a safe state and never falls into an unsafe state." Since the essence of security is confrontation, the existence of confrontation leads to the fate of "going online and decaying" that most machine intelligence models in the security field cannot escape. A model that performs well on the training set, for a large-scale real environment, is causing an escalation of confrontation from the moment it goes online, and then continues to fall into a state of failure. Model decay is as inevitable as entropy increase in a closed system.

At the same time, security scenarios are highly sensitive to the accuracy and interpretability of detection results. Compared with the rule-based and policy-based detection technologies often used in traditional security, machine intelligence has the advantage of its powerful representation ability, but at the same time, its inexplicability and ambiguity make the reasoning results unable to be directly used in decision-making scenarios. Most of today's smart security systems are only doing "sensing", at most it is only the reason for making auxiliary decisions.

However, these are not the biggest "difficulties". The biggest "difficulty" of machine intelligence in the field of security is the dilemma of thinking mode. The safe mode of thinking is "keeping the right and surprising", while the thinking mode of machine intelligence is "Model The World". Not only are there huge differences between these two modes of thinking, but they are also incredibly difficult to reconcile. On the one hand, very few people can control these two ways of thinking at the same time. On the other hand, it is extremely difficult to put people with two kinds of thinking together to cooperate. definition.

Problem space, sample space, reasoning results, anti-attenuation, and thinking patterns have caused the performance of most of today's real-life intelligent security systems to be unsatisfactory. Or to put it more pessimistically, in the field of security today, there is no real intelligent security system so far.

True Smart Security System
First, let’s talk about the general data paradigm in the general security scenario. The Platonists believe that "the world we perceive is the projection on the wall inside the cave", the phenomenal world is the reflection of the rational world, and the rational world is the essence or origin of the world. The "cave metaphor" means that there is an external objective knowledge system that does not depend on human cognition. The process of human exploration of knowledge is the process of constantly exploring and speculating on this objective knowledge system from the observation of phenomena in the real world. Aristotle further established the original idea of ontology, defining it as the science of "existence", which is the basic branch of metaphysics. In the 17th century, the philosopher R. Goclenius first proposed the term "Ontology". In the 1960s, the field of machine intelligence began to introduce the idea of Ontology, and then further evolved into semantic web and knowledge graph.

The essence of confrontation in security is the confrontation of knowledge, and the party that acquires more knowledge will have more asymmetric advantages. Whether it is threat analysis, intelligence research and judgment, attack detection, event tracing... the essence is a process of exploring knowledge, which is why Palantir's Gotham, IBM's I2, UEBA, various threat intelligence products, etc. are all the same or more Or less borrowed from the root cause of Ontology thinking.

The common data paradigm in security scenarios is also inseparable from Ontology. Entities, attributes, behaviors, events, and relationships, through these five metadata types, data architectures in all security scenarios can be constructed (whether it is basic security, business security, data security, public security, urban security... Note: public security The field also pays attention to the metadata type of "trajectory" separately, because "trajectory" is a special kind of "behavior" data, so it is combined into behavior here).

Entity: An entity is an object that exists objectively and can be distinguished from other objects;
Attributes: Attributes are labels, which are expressions that describe entities and describe the abstract aspects of entities;
Behavior: Behavior is an action issued by an entity at a specific time and space;
Event: An event is an identifiable thing recognized in a certain space-time or condition;
Relationships: Relationships are degrees and representations of associations between entities and other entities.
Most of the source data accumulated in the security field is behavioral data, whether it is network traffic logs, host command logs, business logs, camera data streams, sensory device data streams, etc., these are all behavioral data. Entities, attributes, relationships, and events are all extracted from behavioral data and generated by running different functions on different behavioral data.

When the Function generates events, it is a security detection issue, including attack detection, threat detection, risk detection, anomaly detection, and so on. The atomic paradigm of most security detection problems can be abstracted as Y=F(X), where X is the behavior data of the entity, Y is the detection result, and F is the detection model. F can be rule-based, policy-based, lexical semantics, statistical detection, machine learning, deep neural network, etc., and Y can be normal, abnormal, attack or unknown.

More complex detection scenarios can also be assembled and arranged through basic F and various operators. Each type of F has its advantages and disadvantages, and has different optimal usage scenarios. There is no absolutely advanced and leading detection technology. In fact, the most important thing for an algorithm in security detection is not to make the detection model itself, but whether it can autonomously generate the optimal detection model according to various scenarios, and can autonomously and continuously iterate the detection model.

An intelligent security system in the true sense must also have a perception system, a cognitive system, a decision-making system, and an action system, and at the same time form a closed feedback loop with the environment. The perception system includes at least anomaly sensor, attack sensor, false negative sensor and false negative sensor. On the one hand, the role of the "abnormality sensor" is to maintain the ability to perceive the "unknown unknown", and on the other hand, it is to use the idea of "finding abnormalities by defining normality" to solve the problem of "sample space dilemma". The role of the "attack sensor" is to detect attacks based on abnormal data. In order to understand the "difficulty of reasoning results", it also greatly reduces the scope of false positives and false negatives of reasoning results. The "False Negative Perceptron" and "False Positive Perceptron" are to solve the "anti-attenuation problem". It can be seen from this that the "attack detection with algorithms" that everyone in the industry is most concerned about is actually just a small step in the perception system of the intelligent system.

The cognitive system precipitates all kinds of security-related knowledge, including at least normal knowledge, attack knowledge, missed-reported knowledge, and false-reported knowledge. Security knowledge can be based on expert rules, vectors, models, maps, natural language, etc., but no matter what form it is, it must be refined and personalized knowledge of "thousands of people and faces". That is, for each protected object (such as users, systems, assets, domain names, data, etc.), a set of knowledge about perceptual anomalies, attacks, false negatives, and false positives applicable to the protected object is formed. The decision-making system includes at least interception strategies for target tasks, online and offline strategies for various models, etc., and can independently decide which behaviors should be blocked, which models have attenuated and should be retrained or replaced.

In the action system are various actions that affect the environment, such as release, block, retrain, release, and so on. A real intelligent security instance contains thousands of agents, and each agent only acts on its corresponding protected object. Finally, the solution to the "problem space dilemma" is to converge the open problem space into small closed risk scenarios. On the one hand, it relies on the depth detection formed by the cascade of four perceptrons; "Thousands of people, thousands of faces" agent.

Machine intelligence reshape the new security
Since the development of the security field, it has been in a stage where few problems have been eliminated but many concepts have been created. It is urgent to use new technologies to truly solve old problems. The popularity of machine intelligence in various industries has also attracted the attention of the security industry. But today's intelligence capabilities in the security field vary, and at the same time, it is difficult to distinguish between true and false. So much so that anyone who uses a little bit of algorithm will claim "XX security system based on artificial intelligence". Like the field of intelligent driving in the early years, today's intelligent safety also urgently needs a unified grading standard to clarify the differences between different levels of intelligent safety technologies. "The essence of security is the confrontation of intelligent agents." Therefore, according to the degree of autonomous confrontation, we divide intelligent security into six levels: L0-5:

The L0 level is "manual confrontation", that is, there is no ability of machine intelligence at all, and the defenders artificially rain the attackers to fight, and the confrontation operation, perception judgment, and task support are all carried out manually.

The L1 level is "assisted confrontation". Machines complete the attack detection and attack defense of known attacks, and the rest of the operations (such as sensing unknown threats, sensing false positives, and sensing false positives, etc.) are performed by humans.

The L2 level is "low-level autonomous confrontation". The machine completes the known attack attack detection and attack defense, and has the ability to perceive unknown threats or false positives and negative negatives, and the rest is operated by humans.

The L3 level is "moderate autonomous confrontation". All confrontation operations (attack detection, attack defense, active perception of unknown threats, false positives and false negatives active perception, automatic learning of confrontation upgrades) are completed by machines. When to respond (intermediate process must require human participation).

The L4 level is "highly autonomous confrontation". All confrontation operations are completed by machines. According to system requirements, humans may not necessarily provide all responses (human participation is not required in the intermediate process), but they can only be used in limited specific security scenarios ( Such as network domain, host domain, etc.).

The L5 level is "completely autonomous confrontation". All confrontation operations are completed by machines. According to system requirements, humans may not necessarily provide all responses, and they are not limited to specific scenarios and act on the entire domain.

Different from intelligent driving technology, different levels use completely different technology stacks. L0-5 in intelligent safety needs to be gradually built up and developed. According to this division, most of the security systems in the industry today are L1 systems, and very few can reach L2, but there is no real L3 and above intelligent security system. As the level goes up, defenders can be gradually released from low-level confrontation and can pay more attention to high-level confrontation. L3 is a watershed, which is expected to be realized within 5 years. "It starts with Go and ends with safety", what is the endgame of machine intelligence in the field of security? The network layer, host layer, application layer, business layer, and data layer all have their own intelligent instances, and the instances of different layers are interconnected online to realize collaborative defense and intelligence sharing in the true sense. The day when intelligence "Intelligence" and intelligence "Intelligence" merge, is the real "Intelligence Remodels New Security".

At present, Alibaba Cloud Intelligent Security Lab is building L3-level intelligent security systems in many fields, and is committed to the application of intelligent technology in cloud security. It is now recruiting security algorithm experts and security data experts, looking for like-minded people to explore and create "intelligent reshaping" New security". At present, in less than a year, certain phased results have been achieved:

The LTD attack detection algorithm was selected into IJCAI 2019 "Locate Then Detect: Web Attack Detection via Attention-Based Deep Neural Networks";
The WAF AI core helps Alibaba Cloud WAF to be selected into the 2019 Gartner Magic Quadrant for Web Application Firewall, and its algorithm capability is rated as strong;

The Anti-Bot AI core helps Alibaba Cloud crawler risk management to be selected in the 2018 Forrester Bot Management Competitors Quadrant;
The content security algorithm helps Alibaba Cloud survive the national re-insurance activities smoothly without any risk leakage;
Launched a series of security data platform service products such as [XDATA] security data core, [XID] core data assets, [XService] intelligent security services, [string+] security knowledge engine, and launched a complex network with tens of billions of nodes and hundreds of billions of edges And graph computing applications; online complex stream computing applications with a QPS of tens of millions.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us