This topic describes how to use the financial components that are provided by Machine Learning Platform for AI (PAI) to create a scorecard model based on credit card billing statements.
Scorecard is a modeling method that is commonly used in the credit risk assessment and Internet finance fields. It is not only a machine learning algorithm, but also a generic modeling framework. The scorecard modeling process includes the following steps: bin the raw data, perform feature engineering on the data in each bin, and then use the processed data to train a linear model.
Scorecard modeling is commonly used in credit assessment, such as assessment of risks in credit card repayment and credit assessment for loan disbursements. It is also used in other fields for scoring, such as customer service scoring and Alipay credit scoring.
The experiment described in this topic is based on an open source dataset from Default of Credit Card Clients Dataset. This dataset contains 30,000 data records. Each record includes the gender, education, marital status, age, credit card payment history, and credit card billing statements of a user.
The payment_next_month field indicates the probability of a user paying off the credit card debt. The value 1 indicates that the user will pay off the debt. The value 0 indicates that the user will not pay off the debt.
- Go to the Machine Learning Studio console.
- Log on to the PAI console.
- In the left-side navigation pane, choose .
- On the PAI Visualization Modeling page, find the project in which you want to create an experiment and click Machine Learning in the Operation column.
- Create an experiment.
- In the left-side navigation pane, click Home.
- In the Templates section, click Create below [Scorecard] Credit Score Based on Credit Card Usage.
- In the New Experiment dialog box, set the experiment parameters. You can use the default values of the
Parameter Description Name The name of the experiment. Default value: [Scorecard] Credit Score Based on Credit Card Usage. The name must be 1 to 32 characters in length. Enter a name that meets this requirement, for example, Scorecard-based Credit Analysis. Project The project in which you want to create the experiment. You cannot change the value of this parameter. Description The description of the experiment. Default value: Use scorecard modeling to calculate credit scores based on credit card usage. Save To The directory for storing the experiment. Default value: My Experiments.
- Click OK.
- Optional:Wait about 10 seconds. Then, click Experiments in the left-side navigation pane.
- Optional:Click Scorecard-based Credit Analysis_XX under My Experiments. The canvas of the experiment appears.My Experiments is the directory for storing the experiment that you created and Scorecard-based Credit Analysis_XX is the name of the experiment. In the experiment name, _XX is the ID that the system automatically creates for the experiment.
- View the components of the experiment on the canvas, as shown in the following figure.
The system automatically creates the experiment based on the preset template.
Area No. Description 1 The Split-1 component splits the source dataset into a training dataset and a prediction dataset. 2 The Binning-1 component provides a feature that is similar to one-hot encoding. The component groups the input data into data classes (bins). The data values in each bin are replaced by a value, which is the representative of the bin. For example, the Binning-1 component groups the values of age into a number of age intervals, as shown in the following figure.After the experiment is run, right-click Binning-1 on the canvas and select View Data. In the dialog box that appears, view the binning result. In this example, each field falls into multiple intervals after data binning, as shown in the following figure. 3The Population Stability Index-1 component compares the stability of the population before data splitting, the stability of the population after data splitting, and the stability of the population after data binning. Then, the component calculates and returns population stability index (PSI) values for all features, as shown in the following figure.
PSI, which indicates the stability of the population, is an important metric to identify a shift in the population over a period of time. A PSI value that is smaller than 0.1 indicates insignificant changes. A PSI value between 0.1 and 0.25 indicates minor changes. A PSI value that is greater than 0.25 indicates major changes and you must pay special attention.
4 The Scorecard Training-1 component trains a scorecard model. The following figure shows the training result.The purpose of using a scorecard model is to use normalized scores to indicate the weights of the involved features. The following key parameters are involved in the model training result:
- intercepy: the intercept.
- Unscaled: the original weight.
- Scaled: an index that indicates the number of points that a feature gains or loses. For example, if the pay_0 feature falls into the (-1,0] bin, 29 points are lost. If the pay_0 feature falls into the (0,1] bin, 27 points are gained.
- importance: the impact of each feature on the prediction result. A greater value indicates greater impact.
5 The Scorecard Prediction-1 component uses the scorecard model to predict the credit score of each user. The Binary Classification Evaluation-1 component evaluates the quality of the model.
- Run the experiment and view the result.
- In the top toolbar of the canvas, click Run.
- After the experiment is run, right-click Scorecard Prediction-1 on the canvas and select View Data. In the dialog box that appears, view the credit score of each user.