This topic describes Population Stability Index (PSI) that measures the difference in the expected distribution and actual distribution of data to evaluate the stability of a model.

Scenarios

PSI is commonly used in the field of risk control, where stability is an important factor to determine the effectiveness of a model. Most risk control models are used for longer than a year before they are replaced, which is a fairly long time. If the model is not stable, the quality of decisions may be negatively affected.

Syntax

CREATE FEATURE feature_name WITH ( feature_class = '', x_cols = '', parameters=())
Parameter description:
ParameterDescription
feature_nameThe name of the feature.
feature_classThe type of the feature. Set the value to psi.
x_colsThe list of independent variables. Separate multiple variables with commas (,).
parametersCustom parameters for creating the feature. The following parameters are supported:
  • actual_table: the table for the actual distribution of the variable.
  • predict_table: the table for the expected distribution of the variable.
  • bin_num: the number of bins. This is an optional parameter. If you specify this parameter, you must also specify bins_method. If you do not specify this parameter, the system determines the number of bins based on the binning method and the actual data.
  • bins_method: the data binning method. Default value: chi. Valid values:
    • chi: the chi-square method.
    • quantile: the equal frequency method.
    • step: the equal step method.
    • dt: the decision tree method.
    • kmean: the k-means clustering method.
  • categorical_feature: the categorical features. Separate multiple features with commas (,).

Example

/*polar4ai*/CREATE FEATURE psi_001 WITH ( feature_class = 'psi',x_cols='Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',parameters=(actual_table='airlines_train_1000',predict_table='airlines_test_1000',bins_num=10,bins_method='quantile',categorical_feature='Airline,Flight,AirportFrom,AirportTo,DayOfWeek'));