Differences between machine learning, data science, artificial intelligence, deep learning, and statistics-Alibaba Cloud Developer Community

AI(Artificial Intelligence) is a sub-field of computer science founded in the 1960 s. It is about solving tasks that are very easy for human beings but difficult for computers.

Since data science is a broad discipline, it will start with the types of data scientists that may be encountered in any business, through this part, you may find your hidden potential of data scientists :) just like any scientific discipline, data scientists may learn from relevant disciplines, although data science already has its own part, in particular, the methods and algorithms for automatic processing of ultra-large-scale unstructured data can be processed or predicted in real time without human interference.

  1. Various types of data scientists

if you want to start and understand some previous views, you may refer to the article "9 types of data scientists" published in 2014 or another article in the same year to compare data science with "16 analytic disciplines". A closer (August 2016) Ajit Jaokar discussed the differences between Analytics data scientist(Type A) and Builder data scientist(Type B):

Type A Data Scientists IN encountered In The Work related to the data when can write good code, but doesn't have to be experts, such Data scientist may professional is experimental design, prediction, modeling, typical parts of statistical inference or other statistical studies. However, in general, the work output of data scientists is not as "p-values and confidence intervals" as sometimes recommended by academic statistics (as sometimes used by statisticians in the traditional drug field). In Google,Type A Data Scientists usually refer to statisticians, quantitative analysts, decision support technology analysts or Data Scientists, and there may be others.

Type B Data Scientists is building data. Class B and Class A have the same statistical background, but they are still better coders and may have professional software engineering training. They are mainly interested in using data in products. They build models to interact with users, usually providing recommendations (products, people they may know, advertising movies, search results, etc.).

I have written ABCD's of business processes optimization before. D stands for data science,C stands for computer science,B stands for business science, and A stands for analytics science. Data Science may or may not include writing code or mathematical practices. For more information, see low-level versus high-level data science. In start-up companies, data scientists usually have several titles, such as data excavators, data engineers or architects, researchers, statisticians, modelers (predictive modeling), or developers.

Although data scientist often described into proficient in R, Python, SQL, Hadoop And statistical programmer, but this is just the tip of the iceberg, by some training mechanism guidance here. But just as experimental technicians can call themselves physicists, real physicists are far more than that, and there are various fields of expertise: astronomy, mathematics, physics, nuclear physics, mechanics, electricity, signal Processing (also a sub-field of data science), etc. By analogy with data scientists, the real fields involved may be various, such as bioinformatics, information technology, simulation and quality control, financial engineering, epidemiology, industrial engineering, etc.

In the past ten years, the author has devoted himself to the communication between hosts and devices, establishing a system to automatically process large-scale data sets, and performing some automatic transactions, such as purchasing Internet traffic or automatically generating content. These hide the development requirements of unstructured data algorithms, which are also the cross-section of AI(artificial Intelligence), IoT(Internet of thing), and data science. They are called deep data science. This part does not need to deal with mathematics relatively, and does not have much code (mainly some APIs), but it is indeed in the dataset (including building a data system), and based on new statistical methods specially designed for this purpose.

Before that, the author mainly did real-time credit card fraud detection, while in the early stage of his career, he was engaged in image remote sensing technology, that is, to identify specific patterns (or shapes, characteristics, such as identifying lakes) to achieve graphic separation: at that time, research was called computational statistics, and people who did the same thing as computer science called their research Artificial Intelligence. Today, the same research may be called data science or artificial intelligence, and sub-fields may be signal processing, computer vision or the internet of things.

In addition, data scientists are distributed at any time in the life cycle of data science projects, from the data collection stage or data exploration stage to the statistical modeling and maintenance of existing systems.

2. Machine Learning vs Deep learning

before deeply discussing the connection between machine learning and data science, here we briefly discuss machine learning and deep learning. machine learning is a set of algorithms that train datasets to make predictions or take actions to optimize the system. For example, supervised classification algorithms are used to divide customers who want loans into good or bad prospects based on historical data. For a given task (such as supervised clustering), various technologies are required: naive Bayes, SVM, neural nets, ensembles, association rules, decision trees, logistic regression, or a combination of many technologies. For more information about algorithms, click here. For more information about machine learning, click here.

All of these are subsets of data science. When these algorithms are automated, such as unmanned aircraft or unmanned vehicles, this is called AI, or to be more specific, deep learning. Click here browse another article comparing machine learning with deep learning. If the collected data comes from sensors and is transmitted through the Internet, machine learning, data science, or deep learning are applied to the internet of things.

Some people have different definitions of deep learning. They believe that deep learning is a deeper neural network (a machine learning technology). Recently, someone asked this question on Quora. The following are some specific explanations (source is Quora)

AI(Artificial Intelligence) is a sub-field of computer science founded in the 1960 s. It is about solving tasks that are very easy for human beings but difficult for computers. It is worth mentioning that the so-called strong AI may be able to do all things that human beings can do (except for pure physical problems). This is quite extensive, including all kinds of things, such as planning, wandering around the world, recognizing objects and sounds, speaking, translating, social or commercial transactions, and creative work (such as writing poems and drawing) and so on.

NLP(Natural language processing) is only the language part to be processed by AI, especially writing.

Machine learning is such a situation: some AI problems that can be described in discrete form (such as selecting the right one from a series of actions) are given, and then a pile of information about the outside world is given, select the "correct" behavior without requiring programmers to manually write programs. In general, it is necessary to judge whether this action is right or not through some external processes. In mathematics, this is a function: you give some input, and then you want it to process and get the correct output, so the whole problem is simplified to build this mathematical function model in some automatic ways. Distinguish it from AI: if I write a particularly witty program with human behavior, then this can be AI, but unless its parameters are automatically learned from the data, otherwise, it is not machine learning.

Deep learning is a popular type of machine learning. It contains a special mathematical model, which can be considered as a combination of a specific type of simple block (or a combination of block functions), these blocks can be adjusted to better predict the final results.

So, what are the differences between machine learning and statistics? This article tries to answer this question. The authors wrote that statistics is machine learning with confidence intervals of pre-measured or evaluated quantities. I am inclined to oppose because I have established an engineer-friendly confidence interval that does not require any knowledge of mathematics or statistics.

3. Data Science VS Machine Learning

machine learning and statistics are both part of data science. In machine Learning, the word "learning" means an algorithm that relies on some data and is used as a training mode set to adjust some models or algorithm parameters. This includes many technologies, such as regression, naive Bayes, or supervised clustering. But not all technologies are suitable for this classification. For example, unsupervised clustering-a method of statistics and data science-aims to help classification algorithms without relying on any prior knowledge and training set monitoring clustering or clustering structure. You need someone to label the discovered cluster. Some technologies are mixed, such as semi-supervised classification. Some model detection or density assessment techniques are suitable for this classification.

However, data science is much wider than machine learning. In data science, "data" may or may not come from machines or mechanical processes (survey results may be collected manually, clinical trials need a special type of small data, etc.), and it may have nothing to do with the "learning" mentioned above. However, the main difference is that data science actually covers the whole scope of data processing, not just algorithms or statistics.

Of course, in many organizations, data scientists only focus on part of this processing process. I want to know about my original contribution to Data Science.

This article is forwarded from d1net (reprinted)

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now