By Garvin Li
Scorecard is a common method used in the credit risk assessment and Internet financing industries. Scorecard doesn't simply correspond to a specific machine learning algorithm, but is a universal modeling framework. It divides the original data into bins, performs data feature engineering, and then applies data in linear models for modeling.
The scorecard modeling principle is applied in various credit assessment fields, such as credit card risk assessment and loan issuance. In addition, scorecard is often used for score assessment in scenarios such as customer service scoring and Zhima Credit scoring (Alipay credit scoring). This article uses a specific case to explain how to use the finance components of the Alibaba Cloud Machine Learning Platform for AI to establish a scorecard modeling scenario.
Click Load More to establish the scorecard experiment directly from a template, as shown in the following screenshot. This template contains the processes and data of the whole experiment.
The preceding screenshot shows an open source dataset from a foreign institution, with 30,000 pieces of data included. The dataset includes user properties such as gender, education level, marital status, and age as well as each user's credit card consumption records and bills over a past period of time. payment_next_month is the target queue, indicating whether a user repays the credit card bill (1 represents the bill has been repaid; 0 represents the bill has not been repaid).
The dataset can be downloaded from https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
The following diagram shows the experiment process.
Split the input dataset into two parts: one part is used to train the model, and the other is used to perform evaluation prediction.
The binning component is similar to one-hot encoding and can map data according to its distribution into features with higher dimensions. Take the "age" field for example. The binning component can perform binning according to the data distribution in different intervals. The following screenshot shows the binning results.
The final output of the binning component is shown in the following screenshot. Each field is binned to multiple intervals.
PSI is an important metric to measure offsets due to sample changes. PSI is usually used to evaluate sample stability. For example, whether sample changes between two months are stable. In general, a variable PSI value lower than 0.1 indicates insignificant changes; a PSI value between 0.1 and 0.25 indicates significant changes; a PSI value greater than 0.25 indicates exceptionally significant changes that may require special attention.
In this case, by comparing PSI values before and after data splitting as well as PSI values of binning results, the PSI value of each feature is returned as shown in the following screenshot.
The results of scorecard training is shown in the following screenshot.
The essence of scorecard is the representation of complex model weights in the form of scores that meet the business standards.
The final scores of each prediction result (users' credit scores in this case).
Based on users' credit card consumption records, each user's final credit score is obtained by using scorecard model training and scorecard prediction. These final credit scores can be applied in credit investigation fields related to loans or finances.
Visit the Alibaba Cloud Machine Learning Platform for AI page to experience Alibaba Cloud's machine learning capabilities today!
Alex - June 18, 2020
Alibaba Clouder - August 27, 2019
Alibaba Clouder - October 12, 2019
- January 4, 2018
Kaiwai - September 9, 2019
amap_tech - December 4, 2019
This solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.Learn More
ET Brain is Alibaba Cloud’s ultra-intelligent AI Platform for solving complex business and social problemsLearn More
An end-to-end platform that provides various machine learning algorithms to meet your data mining and analysis requirements.Learn More
This technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.Learn More
More Posts by GarvinLi