All Products
Search
Document Center

Platform For AI:Scorecard training

Last Updated:Apr 01, 2026

The Scorecard Training component trains credit scoring models using logistic or linear regression with built-in feature engineering. Unlike standard linear regression, it applies binning-based feature discretization before training, and optionally transforms features using weight of evidence (WOE). The training process supports score transformation, constraint-based optimization, and stepwise feature selection.

Without binning, scorecard training is equivalent to standard logistic or linear regression.

Limitations

The Scorecard Training component stores its model output in a MaxCompute temporary table. The default retention period in Machine Learning Studio is 369 days, which matches the temporary table lifecycle configured for the current workspace in Machine Learning Designer. For details, see Manage workspaces.

To retain the model beyond the default retention period, persist it with the Write Table component. For more information, see FAQ about algorithm components.

Key concepts

Feature engineering

The main difference between scorecard training and standard linear modeling is that scorecard training applies feature engineering before fitting the model. The Binning component supports two approaches:

  • One-hot encoding: Bins each variable and generates N dummy variables (N = number of bins). You can apply constraints to individual dummy variables.

  • WOE conversion: Replaces the original variable value with the weight of evidence (WOE) value of its bin. This encodes predictive information directly into the feature.

Score transformation

In credit scoring, raw model predictions (log-odds) must be converted into interpretable scores. Scorecard Training applies a linear transformation using three parameters:

ParameterDescription
scaledValueThe score assigned at a reference odds value
oddsThe odds corresponding to scaledValue
pdoPoints to double the odds

These three parameters define two points on the score line. For example, with scaledValue=800, odds=50, and pdo=25:

log(50)  = a × 800 + b
log(100) = a × 825 + b

Solve for a and b to derive the linear mapping, then apply it to transform model weights into scores.

Pass the transformation configuration using the -Dscale parameter in JSON format:

{"scaledValue": 800, "odds": 50, "pdo": 25}

All three fields are required when -Dscale is specified.

Constraints

During training, you can add constraints to control how variable weights are learned. Specify constraints in the Binning component — they are automatically passed to Scorecard Training as a JSON string stored in a single-cell table (inputConstraintTableName).

Optimization algorithms

On the Parameters Setting tab, select Advanced Options to configure the optimization algorithm.

AlgorithmOrderSupports constraintsBest for
L-BFGSFirstNoLarge feature sets
Newton's methodSecondNoSmall to medium feature sets; fast convergence
Barrier methodSecondYesEquivalent to SQP in most cases
SQPSecondYesGeneral use with constraints (recommended)

If you are unfamiliar with optimization algorithms, set Optimization Method to Auto Selection. The system selects the most appropriate algorithm based on data volume and whether constraints are present.

Feature selection

The component supports stepwise feature selection — a combination of forward and backward selection. After each forward step (adding a variable), a backward pass removes any variables that no longer meet the significance threshold.

Use the following table to choose the right selection standard for your setup:

Selection standardFeature engineeringModel typeThreshold
Marginal contributionOne-hot or WOE (any)AnyRecommended starting point: 10E-5
Score testWOE conversion onlyLogistic regressionSLENTRY (forward), SLSTAY (backward) via chi-square
F testWOE conversion onlyLinear regressionSLENTRY (forward), SLSTAY (backward) via F distribution

Marginal contribution measures the difference in objective function value between a model with and without variable X. It applies to all feature engineering and model type combinations, making it the most flexible standard.

Score test (forward): In each iteration, the score chi-square of each candidate variable is computed. The variable with the highest chi-square is added, provided its p-value is below SLENTRY. The backward pass uses Wald chi-square; variables with p-value above SLSTAY are removed.

F test (forward): Similar to score test but uses F distribution. F-values are computed for each candidate variable; variables with p-value above SLENTRY are excluded. The backward pass applies the same F-value logic.

Forced variable selection: Specify variables to include unconditionally — no forward or backward selection is applied to them. Configure this using the -Dselected parameter:

{"max_step": 2, "slentry": 0.0001, "slstay": 0.0001}

If -Dselected is left blank or max_step is set to 0, no feature selection is performed.

Parameters

Configure the Scorecard Training component through the Machine Learning Designer UI or by running a PAI command directly. Sample command:

pai -name=linear_model -project=algo_public
    -DinputTableName=input_data_table
    -DinputBinTableName=input_bin_table
    -DinputConstraintTableName=input_constraint_table
    -DoutputTableName=output_model_table
    -DlabelColName=label
    -DfeatureColNames=feaname1,feaname2
    -Doptimization=barrier_method
    -Dloss=logistic_regression
    -Dlifecycle=8
ParameterRequiredDefaultDescription
inputTableNameYesName of the input feature table
labelColNameYesName of the label column
outputTableNameYesName of the output model table
inputTablePartitionsNoFull tablePartitions to read from the input feature table
inputBinTableNameNoBinning result table; triggers automatic feature discretization based on binning rules
featureColNamesNoAll non-label columnsFeature columns to include in training
inputConstraintTableNameNoTable containing constraint JSON (one cell)
optimizationNoautoOptimization algorithm. Valid values: lbfgs, newton, barrier_method, sqp, auto
lossNologistic_regressionLoss function. Valid values: logistic_regression, least_square
iterationsNo100Maximum number of optimization iterations
l1WeightNo0L1 regularization weight. Valid only when optimization=lbfgs
l2WeightNo0L2 regularization weight
mNo10Historical step size for L-BFGS. Valid only when optimization=lbfgs
scaleNoScore transformation configuration in JSON format
selectedNoFeature selection configuration in JSON format
convergenceToleranceNo1e-6Convergence tolerance
positiveLabelNo1Label value for positive samples
lifecycleNoLifecycle of the output table (days)
coreNumNoSystem-determinedNumber of cores
memSizePerCoreNoSystem-determinedMemory per core (MB)

Output

The Scorecard Training component generates a model report with evaluation statistics for each feature bin. The report contains three groups of fields:

  • Feature and bin metadata: feaname, binid, bin, constraint, weight, scaled_weight

  • Training set statistics: woe, contribution, total, positive, negative, percentage_pos, percentage_neg

  • Testing set statistics: test_woe, test_contribution, test_total, test_positive, test_negative, test_percentage_pos, test_percentage_neg

Full column reference:

ColumnTypeDescription
feanameSTRINGFeature name
binidBIGINTBin ID
binSTRINGBin interval description
constraintSTRINGConstraints applied to the bin during training
weightDOUBLEBin weight. For non-scorecard models without binning, this is the model variable weight
scaled_weightDOUBLEScore linearly transformed from the bin weight (scorecard training only)
woeDOUBLEWOE value of the bin in the training set
contributionDOUBLEMarginal contribution of the bin in the training set
totalBIGINTTotal samples in the bin (training set)
positiveBIGINTPositive samples in the bin (training set)
negativeBIGINTNegative samples in the bin (training set)
percentage_posDOUBLEProportion of the bin's positive samples to total positive samples (training set)
percentage_negDOUBLEProportion of the bin's negative samples to total negative samples (training set)
test_woeDOUBLEWOE value of the bin in the testing set
test_contributionDOUBLEMarginal contribution of the bin in the testing set
test_totalBIGINTTotal samples in the bin (testing set)
test_positiveBIGINTPositive samples in the bin (testing set)
test_negativeBIGINTNegative samples in the bin (testing set)
test_percentage_posDOUBLEProportion of the bin's positive samples to total positive samples (testing set)
test_percentage_negDOUBLEProportion of the bin's negative samples to total negative samples (testing set)