Managing risk is a challenging enterprise, and errors are often made which can lead to catastrophic consequences. Today, big data analytics using digital tools like Hadoop or Splunk has seen an uptick amongst corporations looking to mitigate risk. There's an optimism that reviewing big data can yield insights that can help manage risk more effectively and thus prevent disasters such as the 2008 financial crisis. For example, many banks are now performing real-time analytics on customer data such as credit history, transaction history and employment history to more accurately determine which segment of customers represent a high or low risk for being given a mortgage or loan.
In the same way, numerous product manufacturers are utilizing big data analytics in order to determine their customers' likes and dislikes, enabling them to create products that meet their customers' specific tastes. Doctors are using big data to determine high risk patients who require more immediate care. The energy industry is using big data to spot problems in the production process early on before they develop into something unmanageable. And the list goes on across a plethora of different industries.
Nevertheless, while big data offers tremendous potential to manage risk across many industries and sectors, it's important to avoid common mistakes when handling said data. These could produce inaccurate results that will enhance risk if instead of reducing it.
Data scientists must ensure the data they are using is a relevant and complete representation of what they want to analyze (such as customer behavior, or oil pressures). Using incomplete or skewed data sets can lead to erroneous conclusions that will undermine risk management.
Historical data is important for generating insights to manage risk. However, it is recommended to also incorporate the most up-to-date data available, preferably in real time, for the most accurate insights. With the world is continually in flux, what was true yesterday may not be true today.
A frequent mistake when performing big data analytics is not including all the pertinent variables in the calculations. Data scientists must ensure that all relevant variables (e.g. customer income, credit history and employment history for evaluating mortgage suitability) are captured, since even one missing variable can dramatically alter the accuracy of the result. Deciding what the pertinent variables are is not always straightforward, often requiring deep thought as well as even trial and error iteration.
Perhaps the most serious mistake of all is cherry-picking the data set to produce results which are skewed based on the analyst's bias. Data scientists must be very careful to not let their subjective views affect what data sets they select for evaluation. This point seems highly relevant in today's era of 'fake news', where people listen to news which they want to be factual, even if it's not. The same principle applies to big data analytics.
Alex - June 21, 2019
Alex - June 21, 2019
Alibaba Developer - January 10, 2020
yanmin - June 25, 2019
Alibaba Developer - May 8, 2019
Alibaba Clouder - June 12, 2018
SDDP automatically discovers sensitive data in a large amount of user-authorized data, and detects, records, and analyzes sensitive data consumption activities.Learn More
Realtime Compute offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.Learn More
Secure and easy solutions for moving you workloads to the cloudLearn More
TSDB is a stable, reliable, and cost-effective online high-performance time series database service.Learn More
More Posts by Alibaba Clouder