Scorecard is a common modeling tool that is used in the credit risk assessment field. The scorecard performs binning to implement the discretization of variables and uses linear models such as linear and logistic regression models, to train a model. The model training process includes feature selection and score transformation. The scorecard also allows you to add constraints to the variables during model training.

Note If you use the scorecard without binning, scorecard training is equivalent to logistic or linear regression.

Background information

The following description provides the terms involved in scorecard training:
  • Feature engineering
    The main difference between the scorecard and normal linear models is that the scorecard performs feature engineering before it trains linear models. The Scorecard Training component supports the following methods for feature engineering:
    • The Binning component is used to implement feature discretization. Then, one-hot encoding is performed for each variable based on binning results to generate N dummy variables. N represents the number of bins.
      Note When you convert original variables into dummy variables, you can specify constraints for these dummy variables.
    • The Binning component is used to implement feature discretization. Then, weight of evidence (WOE) conversion is performed to replace the original value of a variable with the WOE value of the bin into which the variable falls.
  • Score transformation
    In scenarios such as credit scoring, you must perform a linear transformation to convert the predicted sample odds into a score. The following formula is used for linear transformation: Formula for linear transformationYou can use the following parameters to specify the linear transformation relationship:
    • scaledValue: specifies a scaled score.
    • odds: specifies the odds of the scaled score.
    • pdo: specifies the points at which the odds are doubled.
    For example, scaledValue is 800, odds is 50, and pdo is 25. In this case, the following two dots are determined for a line:
    log(50)=a×800+b
    log(100)=a×825+b
    Calculate the values of a and b, and perform a linear transformation to obtain the scores of a and b.
    The scaling information is specified in the JSON format by using the -Dscale parameter.
    null
    If you specify the -Dscale parameter, you must also specify scaledValue, odds, and pdo.
  • Constraint addition during training
    During scorecard training, you can add constraints to variables. For example, you can specify the score of a specific bin to a fixed value, specify the scores of two bins to a specific proportion, or limit the scores between bins, such as sorting bin scores by WOE value. The implementation of constraints depends on the underlying optimization algorithm that contains constraints. You can specify the constraints in the Binning component in the Machine Learning Platform for AI (PAI) console. After the constraints are specified, the Binning component generates JSON-formatted constraints and automatically transfers them to its connected training component. The system supports the following JSON-formatted constraints:
    • "<": The weights of variables must be sorted in ascending order.
    • ">": The weights of variables must be sorted in descending order.
    • "=": The weight of a specific variable must be a fixed value.
    • "%": The weights of two variables must meet a proportional relationship.
    • "UP": the upper limit for the weights of variables.
    • "LO": the lower limit for the weights of variables.
    Each JSON-formatted constraint is stored in a table as a string. The table contains only one row and one column.
    {
        "name": "feature0",
        "<": [
            [0,1,2,3]
        ],
        ">": [
            [4,5,6]
        ],
        "=": [
            "3:0","4:0.25"
        ],
        "%": [
            ["6:1.0","7:1.0"]
        ]
    }
  • Built-in constraints

    Each original variable has a built-in constraint: For each variable, the average score of a population must be 0. Due to the constraint, the value of scaled_weight in the intercept options of the scorecard model equals the average score of the population in terms of all variables.

  • Optimization algorithms
    On the Parameters Setting tab, select Advanced Options. Then, you can configure the optimization algorithm that is used during scorecard training. The system supports the following optimization algorithms:
    • L-BFGS: This algorithm is a first-order optimization algorithm that is used to process large amounts of feature data. The algorithm does not contain constraints. If you select this algorithm, the system automatically ignores the constraints that you specify.
    • Newton's method: This algorithm is a classic second-order optimization algorithm. It is fast in convergence and accurate. However, the algorithm is not suitable for processing large amounts of feature data because it needs to calculate a second-order Hessian matrix. The algorithm does not contain constraints. If you select this algorithm, the system automatically ignores the constraints that you specify.
    • Barrier method: This algorithm is a second-order optimization algorithm. If the algorithm does not contain constraints, it is completely equivalent to the Newton's method algorithm. The barrier method algorithm provides almost the same computing performance and accuracy as the SQP algorithm. In most cases, we recommend that you select SQP.
    • SQP

      This algorithm is a second-order optimization algorithm. If the algorithm does not contain constraints, it is completely equivalent to the Newton's method algorithm. The SQP algorithm provides almost the same computing performance and accuracy as the barrier method algorithm. In most cases, we recommend that you select SQP.

    Note
    • L-BFGS and Newton's method are optimization algorithms without constraints. Barrier method and SQP are optimization algorithms with constraints.
    • If you are not familiar with optimization algorithms, we recommend that you set the Optimization Algorithm parameter to Auto-selected by default. In this case, the system selects the most appropriate algorithm based on the data amount and constraints.
  • Feature selection
    The Scorecard Training component supports stepwise feature selection. Stepwise feature selection is a combination of forward and backward selection. Each time the system performs a forward selection to select a new variable and adds it to the model, the system also performs a backward selection. The backward selection is used to remove the variables whose significance does not meet requirements. Stepwise feature selection supports various functions and feature transformation methods. Therefore, stepwise feature selection also supports multiple selection standards. The following standards are supported:
    • Marginal contribution: This standard can be applied to all functions and feature engineering methods.

      For this standard, two models must be trained: Model A and Model B. Model A does not contain Variable X, and Model B contains Variable X in addition to all the variables of Model A. The difference between the functions of the two models in final convergence is the marginal contribution of Variable X to all the other variables in Model B. In scenarios where variables are converted into dummy variables, the marginal contribution of Variable X is the difference between the functions of all dummy variables in Model A and the functions of all dummy variables in Model B. Therefore, the marginal contribution standard is supported by all feature engineering methods.

      Marginal contribution is flexible and is not limited to a specific type of model. Only variables that contribute to functions are passed to the model. Marginal contribution has disadvantages when compared with statistical significance. Typically, 0.05 is used as the threshold for statistical significance. Marginal contribution does not provide a recommended threshold for beginners. We recommend that you set the threshold to 10E-5.

    • Score test: This standard supports only WOE conversion and logistic regression without feature engineering.

      During a forward selection, a model that has only intercept options is trained first. In each subsequent iteration, the score chi-squares of the variables that are not passed to the model are measured. The variable with the largest score chi-square is passed to the model. In addition, the p-value of the variable with the largest score chi-square is calculated based on chi-square distribution. If the p-value of the variable is greater than the given SLENTRY value, the variable is not passed to the model, and feature selection is terminated.

      After the forward selection is complete, a backward selection is performed for the variable that is passed to the model. The Wald chi-square of the variable and the related p-value are calculated. If the p-value is greater than the given SLSTAY value, the variable is removed from the model. Then, the system starts a new iteration.

    • F test: This standard supports only WOE conversion and linear regression without feature engineering.

      During a forward selection, a variable that has only intercept options is trained first. In each subsequent iteration, the F-values of the variables that are not passed to the model are calculated. F-value calculation is similar to marginal contribution calculation. Two models must be trained to calculate the F-value of a variable. The F-value follows F distribution. The related p-value can be calculated based on the probability density function of F distribution. If the p-value is greater than the given SLENTRY value, the variable is not passed to the model, and the forward selection is terminated.

      During the backward selection, the F-value is used to calculate the significance of a variable in a way similar to a score test.

  • Forced selection of the variables that you want to pass to a model
    Before a feature selection is performed, you can specify the variables that you want to forcibly pass to the model. No forward or backward selection is performed for the specified variables. These variables are directly passed to the model regardless of their significance. You can specify the number of iterations and significance thresholds by using the -Dselected parameter. Specify this parameter in the JSON format. Example:
    {"max_step":2, "slentry": 0.0001, "slstay": 0.0001}
    If the -Dselected parameter is left empty or the max_step parameter is set to 0, no feature selection is performed.

Configure the component

You can configure the Scorecard Training component in the PAI console or by running a PAI command. The following code provides a sample PAI command:
pai -name=linear_model -project=algo_public
    -DinputTableName=input_data_table
    -DinputBinTableName=input_bin_table
    -DinputConstraintTableName=input_constraint_table
    -DoutputTableName=output_model_table
    -DlabelColName=label
    -DfeatureColNames=feaname1,feaname2
    -Doptimization=barrier_method
    -Dloss=logistic_regression
    -Dlifecycle=8
Parameter Description Required Default value
inputTableName The name of the input feature table. Yes N/A
inputTablePartitions The partitions that are selected from the input feature table. No Full table
inputBinTableName The name of the binning result table. If you specify this parameter, the system automatically performs discretization for features based on the binning rules in the binning result table. No N/A
featureColNames The feature columns that are selected from the input table. No All columns except the label column
labelColName The name of the label column. Yes N/A
outputTableName The name of the output model table. Yes N/A
inputConstraintTableName The name of the table that stores constraints. The constraints are a JSON string that is stored in a cell of the table. No N/A
optimization The optimization algorithm. Valid values:
  • lbfgs
  • newton
  • barrier_method
  • sqp
  • auto
Only sqp and barrier_method support constraints. If you set the optimization parameter to auto, the system automatically selects an appropriate optimization algorithm based on user data and related parameter settings. If you are not familiar with optimization algorithms, we recommend that you set the optimization parameter to auto.
No auto
loss The loss type. Valid values: logistic_regression and least_square. No logistic_regression
iterations The maximum number of iterations for optimization. No 100
l1Weight The parameter weight of L1 regularization. Only L-BFGS supports this parameter. No 0
l2Weight The parameter weight of L2 regularization. No 0
m The historical step size for optimization that is performed by using the L-BFGS algorithm. Only the L-BFGS algorithm supports this parameter. No 10
scale The weight scaling information of the scorecard. No Empty string
selected Specifies whether to enable feature selection during scorecard training. No Empty string
convergenceTolerance The convergence tolerance. No 1e-6
positiveLabel Specifies whether the samples are positive samples. No 1
lifecycle The lifecycle of the output table. No N/A
coreNum The number of cores. No Determined by the system
memSizePerCore The memory size of each core. Unit: MB. No Determined by the system

Output

The Scorecard Training component generates data to a model report. The model report contains basic model evaluation statistics, such as binning information, binning constraints, WOE values, and marginal contribution information. The following table describes the columns in a model report.
Column Data type Description
feaname STRING The feature name.
binid BIGINT The bin ID.
bin STRING The description of the bin, which indicates the interval of the bin.
constraint STRING The constraints that are added to the bin during training.
weight DOUBLE The weight of a binning variable. For a non-scorecard model without binning, this field indicates the weight of a model variable.
scaled_weight DOUBLE The score that is linearly transformed from the weight of a binning variable in scorecard training.
woe DOUBLE A statistical metric. It indicates the WOE value of a bin in the training set.
contribution DOUBLE A statistical metric. It indicates the marginal contribution value of a bin in the training set.
total BIGINT A statistical metric. It indicates the total number of samples in a bin in the training set.
positive BIGINT A statistical metric. It indicates the number of positive samples in a bin in the training set.
negative BIGINT A statistical metric. It indicates the number of negative samples in a bin in the training set.
percentage_pos DOUBLE A statistical metric. It indicates the proportion of positive samples in a bin to total positive samples in the training set.
percentage_neg DOUBLE A statistical metric. It indicates the proportion of negative samples in a bin to total negative samples in the training set.
test_woe DOUBLE A statistical metric. It indicates the WOE value of a bin in the testing set.
test_contribution DOUBLE A statistical metric. It indicates the marginal contribution value of a bin in the testing set.
test_total BIGINT A statistical metric. It indicates the total number of samples in a bin in the testing set.
test_positive BIGINT A statistical metric. It indicates the number of positive samples in a bin in the testing set.
test_negative BIGINT A statistical metric. It indicates the number of negative samples in a bin in the testing set.
test_percentage_pos DOUBLE A statistical metric. It indicates the proportion of positive samples in a bin to total positive samples in the testing set.
test_percentage_neg DOUBLE A statistical metric. It indicates the proportion of negative samples in a bin to total negative samples in the testing set.