The population stability index (PSI) is a statistical metric used to assess the difference between two sample distributions. It is commonly used to monitor model performance stability over time or across different environments. PSI calculates the distribution difference between two samples to help identify potential data shifts or drifts, providing a basis for model maintenance and updates.
Algorithm description
The population stability index (PSI) measures sample stability. For example, use it to determine if the distribution of a sample has changed significantly between two months. A PSI value less than 0.1 indicates that changes are not significant. A PSI value between 0.1 and 0.25 indicates a noticeable change. A PSI value greater than 0.25 indicates a major change that requires special attention.
Measure sample stability at different times by creating a chart. To do this, discretize the variables into N bins. Then, calculate the number and proportion of samples in each bin and display the results in a column chart, as shown in the following figure.
This method lets you visually check for major changes in a variable across two sets of samples but does not provide a quantitative measure. This means you cannot use it for automatic monitoring of sample stability. This is why PSI is important. The formula to calculate PSI is shown below.
Configure the component
Method 1: Use the GUI
On the Designer workflow page, add the Population Stability Index (PSI) component. Then, configure its parameters on the right side of the page:
Parameter type | Parameter | Description |
Fields setting | Features for PSI calculation | The feature columns for which to calculate the PSI. |
Execution tuning | Number of cores | The number of CPU cores to use. The system automatically allocates cores by default. |
Memory size | The memory size for each CPU core. The system automatically allocates memory by default. |
Method 2: Use a PAI command
Use a PAI command to configure the parameters for the Population Stability Index (PSI) component. Run PAI commands using the SQL script component. For more information, see SQL Script.
PAI -name psi
-project algo_public
-DinputBaseTableName=psi_base_table
-DinputTestTableName=psi_test_table
-DoutputTableName=psi_bin_table
-DinputBinTableName=pai_index_table
-DfeatureColNames=fea1,fea2,fea3
-Dlifecycle=7Parameter | Required | Default value | Description |
inputBaseTableName | Yes | None | The name of the base table. The offset of the test table is calculated based on this base table. |
inputBaseTablePartitions | No | The entire table | The partitions of the input base table. |
inputTestTableName | Yes | None | The name of the test table. The offset of the test table is calculated based on the base table. |
inputTestTablePartitions | No | The entire table | The partitions of the input test table. |
inputBinTableName | Yes | None | The name of the binning result table. |
featureColNames | No | The entire table | The feature columns for which to calculate the PSI. |
outputTableName | Yes | None | The name of the output metric table. |
lifecycle | No | None | The lifecycle of the output table. |
coreNum | No | Automatically allocated by the system | The number of CPU cores to use. |
memSizePerCore | No | Automatically allocated by the system | The memory size for each CPU core, in MB. |
Example
Bin the feature data before calculating the PSI. This requires a binning component. In the example shown in the following figure, the PSI component is connected to the two sample datasets and the binning component. To perform the calculation, simply configure the Features for PSI calculation.

The following figure shows the PSI calculation results: