Data Science | Data Science Overview-Alibaba Cloud Developer Community

Data Science (Data Science)

data science is an interdisciplinary field that includes all content related to structured and unstructured data, starting from preparation, cleanup, analysis, and from a useful perspective. It combines mathematics, statistics, intelligent data capture, programming, problem solving, data cleaning, different observation angles, preparation and data alignment.

In short, it is a combination of several technologies and processes for data processing to obtain a valuable business perspective. Through the use of scientific methods, algorithms, processes and systems to effectively extract information, which can be used by businesses to make key business decisions.

Big Data (Big Data)

big data has several features, the most famous of which are volume, velocity and variety. In addition, veracity, valence, and value are included.

Volume

this is the essence of big data. There is a lot of data and a large amount of data. The data volume itself does not make the data useful, so we need to process it again.

The running speed of computers determines that such a large scale of data cannot be processed quickly. Therefore, in the field of large amounts of data, there are challenges such as storage, access and processing related costs, scalability and performance.

Velocity

I have seen several explanations of this word. There are many similar translations. Some people think that they should be translated into timeliness, but I personally don't think so. Most English explanations are about the speed of data processing.

When you process so much data, the speed of access and the speed of obtaining the required results are crucial.

For example, Google Flu Trends (although it has proved to be a failure) can predict influenza by collecting data in real time for calculation. If your calculation speed is slow and you cannot process so much data, then when the flu breaks out, you haven't calculated whether there is flu lurking in this area, which loses timeliness. Therefore, the processing speed is very important.

Variety

diverse Data formats include structured and non-structured data such as text, audio and video, web pages, and streaming data.

SOURCE Diversity: real-time data from bullet train system, weekly statistics from Wal-Mart system, etc.

Media Diversity: With the development of multimedia, more and more media are used to spread, such as audio, video, pictures, etc.

Semantic diversity: it can be divided into two aspects. On the one hand, for the simplest example, we can use numbers to represent age, and we also use children, youth and the elderly to represent age. On the other hand, in different semantic situations, the same word may contain unnecessary meanings.

Veracity

doubt refers to whether the reliability and quality of the data itself are sufficient when the data sources become more diverse. If the data itself is problematic, the results after analysis will not be correct.

Valence

this is the most rarely seen and talked about attribute.

The source of this attribute is chemical valence, which refers to the data connectivity, that is, the fraction of the connected data items and the total number of possible connections.

When two data are related, the two data are said to be connected to each other. The connectivity increases over time, leading to more and more complex data relationships. As a result, group event prediction, modeling and prediction of relationship changes, and so on become more and more complex.

Value

as mentioned above, the core of big data is value. All the difficulties and problems of big data are how to transform data into value. Including large capacity and easy access to various data and the value of providing high-quality analysis to make wise decisions.

Data Analysis (Data Analysis)

data analysis refers to the process of analyzing a large amount of collected data with appropriate statistical analysis methods, extracting useful information and forming conclusions, and studying and summarizing the data in detail. This process is also a supporting process of the quality management system. In practice, data analysis can help people make judgments in order to take appropriate actions.

Please read this disclaimer carefully before you start to use the service. By using the service, you acknowledge that you have agreed to and accepted the content of this disclaimer in full. You may choose not to use the service if you do not agree to this disclaimer. This document is automatically generated based on public content on the Internet captured by Machine Learning Platform for AI. The copyright of the information in this document, such as web pages, images, and data, belongs to their respective author and publisher. Such automatically generated content does not reflect the views or opinions of Alibaba Cloud. It is your responsibility to determine the legality, accuracy, authenticity, practicality, and completeness of the content. We recommend that you consult a professional if you have any doubt in this regard. Alibaba Cloud accepts no responsibility for any consequences on account of your use of the content without verification. If you have feedback or you find that this document uses some content in which you have rights and interests, please contact us through this link: https://www.alibabacloud.com/campaign/contact-us-feedback. We will handle the matter according to relevant regulations.
Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now