All Products
Search
Document Center

Platform For AI:Reject Inference

Last Updated:Jan 13, 2025

Reject Inference (RI) is a commonly used technique in financial risk management, primarily aimed at mitigating sample selection bias and enhancing the accuracy and reliability of credit assessment models. The core idea of reject inference is to utilize information from accepted customers (those who have passed the approval process) to infer the risk characteristics of rejected customers (those who have not passed the approval process), thereby providing a more comprehensive evaluation of credit risk.

Algorithm

The data modeling conducted on user repayments and defaults by using the scorecard algorithms in credit scenarios uses only the data of users who receive the loan. The data of applicants who did not receive the loan is not included. This leads to inaccuracy in the prediction results of the model. The results may be overly optimistic in most cases. You can use the Reject Inference algorithm to handle this issue.

The Reject Inference method adds labels to the data that does not have the actual label but contains the prediction result based on the input training data. The training data is also known as the accept data and contains the actual label and the prediction results. The data without the actual label is also known as the rejection data. The algorithm provides the following four rejection inference methods.

  • fuzzy

    The fuzzy method enhances the dataset by adding good and bad labels to the rejection data. The sample weight of each label is calculated based on the following formula: image.pngimage.pngimage.png

    In the preceding formula, image.pngis the probability of good samples predicted by the scorecard component in the previous step. You can specify theimage.png and image.pngparameters:

    • image.png: the rejection rate of all data.

    • image.png: the ratio of the probability of bad samples in the rejection data to the probability of bad samples in the accept data.

  • hard cut-off

    The hard cut-off method requires you to set a threshold score based on the result of the scorecard model in the previous step and the risk tolerance for rejected users. The system adds the bad sample label to samples lower than the threshold score, and adds the good sample label to samples higher than the threshold score.

  • parcelling

    The parcelling method groups the accept data based on the prediction results of the scorecard model in the previous step and calculates the default rate of each group. Then, the system groups the rejection data in the same way, uses the default rate of each group as the sampling rate, and randomly selects the default samples in the group. The selected samples are bad samples, the rest are good samples.

  • two stage

    The two stage method requires the prediction results of the scorecard model in the previous step (AcceptRejectScore), as well as the acceptance or rejection probability of the sample output from the model prediction component in the previous step (GoodBadScore). The two-stage method modifies the prediction results of the scorecard model on the unlabeled samples by fitting the linear relationship between AcceptRejectScore and GoodBadScore, and then adds labels to the samples according to the parcelling method.

Inputs and outputs

Input ports

Output port

The output type is a MaxCompute table, with downstream components including Scorecard Training and Binning.

Configure the component

Tab

Parameter

Required

Description

Default value

Field Setting

good/bad score column

Yes

The prediction results column of the scorecard component. In most cases, this parameter is the output of the prediction_score column of the scorecard component. The accept data is labeled based on whether the sample is good or bad.

No default value.

actual label column

Yes

The name of the actual label column of the accept data.

No default value.

weight column

No

The name of the weight column.

No default value.

accept rate score column

No

The acceptance probability of the predicted samples. In most cases, this parameter is the output of the scorecard component. The data is labeled based on whether the sample is accepted or rejected.

This parameter is required if you set the inference method to two stage.

No default value.

Parameter Setting

inference method

No

Valid values:

  • fuzzy

  • hard cut-off

  • parcelling

  • two stage

fuzzy

rejection rate

Yes

Indicates the rejection probability of a sample.

0.3

buckets number

No

This parameter is required if you set inference method to parcelling or two stage.

The number of buckets for training.

25

cutoff score

No

This parameter is required if you set inference method to hard cut-off.

The threshold score. The system adds the good sample label to samples higher than the threshold score, and adds the bad sample label to samples lower than the threshold score.

No default value.

event rate increase

No

This parameter is required if you set inference method to fuzzy, parcelling, or two stage. The event rate increase is a scaling entity that differs according to the selected inference method.

  • When you use the parcelling or the two stage method, the quality of accept data may be better than that of a rejection data in the same bucket. For example, if the EVENT_RATE_INCREASE is set to 1.5 and 30% of the accept data in a bucket is bad, then the probability of bad samples in the rejection data may be 30%×1.5=45%. Then, the system randomly adds bad labels to 45% samples in the rejection data.

  • When you use the fuzzy method, this parameter influences the sample weight. For more information, see the description of image..png in the preceding section "How Reject Inference works".

1.0

seed

No

This parameter is required if you set inference method to parcelling.

The seed used when the system randomly specifies labels.

0

score range method

No

This parameter is required if you set inference method to parcelling or two stage.

Valid values:

  • accepts

  • rejects

  • augmentation

augmentation

Score Conversion

Yes

If you select Score Conversion, you need to set the scaledValue, odds, and pdo parameters. For more information, see Scorecard Training.

false

scaledValue

No

No default value.

odds

No

No default value.

pdo

No

No default value.

Execution Tuning

Choose Running Mode

Yes

The type of the resources used to run the job.

MaxCompute

Number of Workers

No

The number of nodes on which the job is run. The value needs to be a positive integer. Valid values: [1,9999].

No default value.

Memory per worker, unit MB

No

The memory size of each worker node. Unit: MB. Valid values: [1024,65536].

No default value.