A critical review of existing and new population stability testing procedures in credit risk scoring
Credit scorecards are models used for the modelling of the probability of default of clients. The decision to extend credit to an applicant, as well as the price of the credit, is often based on these models. In order to ensure that scorecards remain accurate over time, the hypothesis of population stability is tested periodically; that is, the hypothesis that the distributions of the attributes of clients at the time when the scorecard was developed is still representative of these distributions at review is tested. A number of measures of population stability are used in practice, with several being proposed in the recent literature. This paper provides a critical review of several testing procedures for the mentioned hypothesis. The widely used population stability index is discussed alongside two recently proposed techniques. Additionally, the use of classical goodness-of-fit techniques is considered and the problems associated with large samples are investigated. In addition to the existing testing procedures, we propose two new techniques which can be used to test population stability. The first is based on the calculation of effect sizes which does not suffer the same problems as classical goodness-of-fit techniques when faced with large samples. The second proposed procedure is the so-called overlapping statistic. We argue that this simple measure can be useful due to its intuitive interpretation. In order to demonstrate the use of the various measures, as well as to highlight their strengths and weaknesses, several numerical examples are included.
READ FULL TEXT