I argue that these properties, coupled with the fact that interpretability is often an important requirement in such applications, means that the form of logistic regression used in the industry should be the method of choice.
Workshop:
" Statistical data mining between research and practice
"
27./28. Februar 2004 in Hamburg
David Hand
(Head of Statistics Section,
Department of Mathematics,
Imperial College London)
Academic enthusiasms and practical realities in classification problems
The consumer credit industry makes extensive use of predictive models called scorecards. Originally, these were used to predict the risk of default of applicants for a loan, so that accept/reject decisions could be made. In recent years, however, the use of such models has expanded dramatically, and they are now being used to predict likely default, mortgage churn, attrition, credit card transaction and repayment behaviour, future behaviour of existing customers, and a wide variety of other outcomes. The industry standard approach to these models is logistic regression. In recent decades, however, several distinct research communities, including statistics, pattern recognition, operations research, machine learning, and data mining, have developed entirely new classes of classification algorithm, including such tools as classification trees, neural networks, nearest neighbour methods, support vector machines, and many others. Much work has been published demonstrating superior performance of these tools compared with logistic regression. It is natural, therefore, that these new methods should be applied in the consumer credit industry. However, although these new methods have been adopted for use in certain specialised applications (such as fraud detection) they have had relatively little impact on the central position occupied by logistic regression. This talk briefly reviews these new methods, and then examines why their impact has been so small in this sector. There appear to be several reasons.
Firstly, the form of logistic regression used in the industry is more flexible than it might seem, in fact forming a particular kind of generalised additive model.
Secondly, the more sophisticated methods are based on a classical statistical paradigm which, I argue, is not perfectly matched to credit scoring problems because it fails to take account of important aspects of the application.
Thirdly, I show that the marginal gains of increasing scorecard complexity are typically small, so that the 'improvement' of the more complex models may often be illusory.
Impressum
20. Feb. 2004,
von
Stefan Heitmann