Data Science | Data Science Overview-Alibaba Cloud Developer Community

Data Science (Data Science)

data science is an interdisciplinary field that includes all content related to structured and unstructured data, starting from preparation, cleanup, analysis, and from a useful perspective. It combines mathematics, statistics, intelligent data capture, programming, problem solving, data cleaning, different observation angles, preparation and data alignment.

In short, it is a combination of several technologies and processes for data processing to obtain a valuable business perspective. Through the use of scientific methods, algorithms, processes and systems to effectively extract information, which can be used by businesses to make key business decisions.

Big Data (Big Data)

big data has several features, the most famous of which are volume, velocity and variety. In addition, veracity, valence, and value are included.

Volume

this is the essence of big data. There is a lot of data and a large amount of data. The data volume itself does not make the data useful, so we need to process it again.

The running speed of computers determines that such a large scale of data cannot be processed quickly. Therefore, in the field of large amounts of data, there are challenges such as storage, access and processing related costs, scalability and performance.

Velocity

I have seen several explanations of this word. There are many similar translations. Some people think that they should be translated into timeliness, but I personally don't think so. Most English explanations are about the speed of data processing.

When you process so much data, the speed of access and the speed of obtaining the required results are crucial.

For example, Google Flu Trends (although it has proved to be a failure) can predict influenza by collecting data in real time for calculation. If your calculation speed is slow and you cannot process so much data, then when the flu breaks out, you haven't calculated whether there is flu lurking in this area, which loses timeliness. Therefore, the processing speed is very important.

Variety

diverse Data formats include structured and non-structured data such as text, audio and video, web pages, and streaming data.

SOURCE Diversity: real-time data from bullet train system, weekly statistics from Wal-Mart system, etc.

Media Diversity: With the development of multimedia, more and more media are used to spread, such as audio, video, pictures, etc.

Semantic diversity: it can be divided into two aspects. On the one hand, for the simplest example, we can use numbers to represent age, and we also use children, youth and the elderly to represent age. On the other hand, in different semantic situations, the same word may contain unnecessary meanings.

Veracity

doubt refers to whether the reliability and quality of the data itself are sufficient when the data sources become more diverse. If the data itself is problematic, the results after analysis will not be correct.

Valence

this is the most rarely seen and talked about attribute.

The source of this attribute is chemical valence, which refers to the data connectivity, that is, the fraction of the connected data items and the total number of possible connections.

When two data are related, the two data are said to be connected to each other. The connectivity increases over time, leading to more and more complex data relationships. As a result, group event prediction, modeling and prediction of relationship changes, and so on become more and more complex.

Value

as mentioned above, the core of big data is value. All the difficulties and problems of big data are how to transform data into value. Including large capacity and easy access to various data and the value of providing high-quality analysis to make wise decisions.

Data Analysis (Data Analysis)

data analysis refers to the process of analyzing a large amount of collected data with appropriate statistical analysis methods, extracting useful information and forming conclusions, and studying and summarizing the data in detail. This process is also a supporting process of the quality management system. In practice, data analysis can help people make judgments in order to take appropriate actions.

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now