Community Blog What is Data Science

What is Data Science

Data science is a discipline that makes data useful.

Data science is a discipline that makes data useful. It contains three important concepts: statistics, machine learning, data mining/analysis.

Definition of Data Science

If you look back at the early history of the term data science, you will find two themes are closely connected:

  • Big data means that the frequency of use of computers has increased.
  • It is difficult for statisticians to implement algorithms written on paper with computers.

As a result, data science emerged. Earlier, people thought of data scientists as statisticians who knew how to code. Now it seems that this statement is not accurate. First, let us return to data science itself.

In 2003, the "Data Science Journal" once stated: "The so-called'data science' refers to any data-related content." I agree with this, and now everything cannot be separated from the data.

Since then, definitions of data science have emerged in endlessly, such as Conway's Venn diagram and the classic views of Mason and Wiggins.

The definition of data science on Wikipedia is closer to what I teach to students:

Data science is just a concept that combines statistics, data analysis, machine learning and related methods, and aims to use data to "understand and analyze" actual phenomena.

Simply put, Data Science is a discipline that makes data useful.

Data mining

If you don’t know what decision you have to make, the best way is to find inspiration. This is the so-called data mining, data analysis, descriptive analysis, exploratory data analysis or knowledge discovery.

Unless you know how to make your decision, start by looking for inspiration. The method is very simple, you just need to think of the data set as a pile of negatives you find in a dark room. Data mining is to make the device publish all the pictures as quickly as possible so that you can see if there is anything inspiring on these pictures. As with photos, don’t take what you see too seriously. You didn't take these photos, so you don't know much about things outside the screen. The golden rule of data mining is: only make conclusions about what you can see, not what you can't see, because you need statistics and more professional knowledge.

In addition, you should try to do your best. The expertise of data mining is judged by checking the speed of the data. Don't be obsessed with things that seem interesting.

Statistical inference

Inspiration is easy to obtain, but rigor is difficult to achieve. If you want to master data, you need professional courses. As an undergraduate and graduate student majoring in statistics, I think statistical inference (statistics for short) is the most difficult and most philosophical among these three fields. It takes a lot of time to do it well.

If you plan to make high-quality and risk-controllable decisions, since decision-making does not only rely on the data you get, you need to add statistical skills to your analysis team at this time.

When the situation is uncertain, perhaps statistics can change your mind.

Machine learning

In essence, machine learning uses examples rather than instructions to implement operations. I have also written some articles about machine learning, including how machine learning is different from artificial intelligence, how to get started with machine learning, the experience and lessons of using machine learning in enterprises, and introducing children to supervised learning.

Related Blog

Real-World Implementation of Data Analytics with Alibaba Cloud

Over the last decade, data demand has grown substantially, with multiple data processing systems surfacing for processing, analysis, and development. For instance, IDC estimated in 2016 that by 2020, the digital universe will have grown to 70 zettabytes. The amount of data to be produced in 2021 is expected to surpass this number on a massive scale.

Big data, data warehousing, Data Analytics, and the smallest demands for information today are dealt with through data and its manipulation. Part 1 of this six-part article series focuses on data analytics solutions by Alibaba Cloud.

Related Solutions:

Data Analytics and AI

Data powers intelligent business. As the market is maturing and enterprises are adopting various data analytics products and solutions, coherent data integration becomes a new challenge. Alibaba Cloud’s Data Analytics and AI solutions help you build a unified platform with full data analytic capabilities to streamline your data pipeline and create a consistent user experience throughout the complete data lifecycle. Alibaba Cloud provides industry solutions and applications to embed these data analytic capabilities into your business processes and professional Big Data Consulting Services to help lower total cost of ownership (TCO) and make your data analytics journey easier.

0 0 0
Share on

Alibaba Clouder

2,606 posts | 737 followers

You may also like

Alibaba Clouder

2,606 posts | 737 followers

Related Products