Population stability index (PSI) is an important metric to identify a shift in two samples of a population.

Background information

PSI is a common metric that is used to measure the stability of samples. For example, you can use it to measure whether the changes in the population within two months are stable. A PSI value less than 0.1 indicates insignificant changes. A PSI value from 0.1 to 0.25 indicates minor changes. A PSI value greater than 0.25 indicates major changes.

If the changes in a population over time are unstable, you can use charts to identify the changes. You can perform binning for variables, calculate the number and proportion of the samples in each bin, and then present the statistics in a column chart. The following figure shows a sample chart. Column chartThe preceding method can directly show whether a variable in two samples changes significantly. However, the shift in these changes cannot be measured by using this method. Therefore, the population stability cannot be automatically monitored. To resolve this issue, you can use the Population Stability Index component. The following figure shows the formula that is used to calculate PSI values. PSI calculation formula

Configure the component

You can use one of the following methods to configure the Population Stability Index component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Population Stability Index component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingFeatures for PSI CalculationThe feature columns that are required for PSI value calculation.
TuningCoresThe number of CPU cores that are required. By default, the system determines the value.
Memory SizeThe memory size of each CPU core. By default, the system determines the value.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name psi
-project algo_public
-DinputBaseTableName=psi_base_table
-DinputTestTableName=psi_test_table
-DoutputTableName=psi_bin_table
-DinputBinTableName=pai_index_table
-DfeatureColNames=fea1,fea2,fea3
-Dlifecycle=7
ParameterDescriptionRequiredDefault value
inputBaseTableNameThe name of the base table. The shift of the population is calculated based on the samples in the base and test tables. YesNo default value
inputBaseTablePartitionsThe partitions that are selected from the base table. NoFull table
inputTestTableNameThe name of the test table. The shift of the population is calculated based on the samples in the base and test tables. YesNo default value
inputTestTablePartitionsThe partitions that are selected from the test table. NoFull table
inputBinTableNameThe name of the binning result table. YesNo default value
featureColNamesThe feature columns that are required for PSI value calculation. NoFull table
outputTableNameThe name of the output table. YesNo default value
lifecycleThe lifecycle of the output table. NoNo default value
coreNumThe number of CPU cores that are required. NoDetermined by the system
memSizePerCoreThe memory size of each CPU core. Unit: MB. NoDetermined by the system

Example

Use the Binning component to perform binning for features. Then, connect the Population Stability Index component to the two sample datasets that you want to compare and the Binning component, as shown in the following figure. Specify the Features for PSI Calculation parameter. Example on how to use PSIThe following figure shows the calculation results of the Population Stability Index component. Calculation results