没有合适的资源?快使用搜索试试~ 我知道了~
One of the most important drivers of macroeconomic conditions and systemic risk is consumer spending, which accounted for over two thirds of U.S. gross domestic product as of October 2008. With $13.63 trillion of consumer credit outstanding as of the fourth quarter of 2008 ($10.47 trillion in mortgages $2.59 trillion in other consumer debt), the opportunities and risk exposures in consumer lending are equally outsized.
资源推荐
资源详情
资源评论
Consumer Credit Risk Models via
Machine-Learning Algorithms
∗
Amir E. Khandani
†
, Adlar J. Kim
‡
, and Andrew W. Lo
§
This Draft: May 9, 2010
Abstract
We apply machine-learning techniques to construct nonlinear nonparametric forecasting
models of consumer credit risk. By combining customer transactions and credit bureau
data from January 2005 to April 2009 for a sample of a major commercial bank’s customers,
we are able to construct out-of-sample forecasts that significantly improve the classifica-
tion rates of credit-card-holder delinquencies and defaults, with linear regression R
2
’s of
forecasted/realized delinquencies of 85%. Using conservative assumptions for the costs and
benefits of cutting credit lines based on machine-learning forecasts, we estimate the cost sav-
ings to range from 6% to 25% of total losses. Moreover, the time-series patterns of estimated
delinquency rates from this model over the course of the recent financial crisis suggests that
aggregated consumer-credit risk analytics may have important applications in forecasting
systemic risk.
Keywords: Household Behavior; Consumer Credit Risk; Credit Card Borrowing; Machine
Learning; Nonparametric Estimation
JEL Classification: G21, G33, G32, G17, G01, D14
∗
The views and opinions expressed in this article are those of the authors only, and do not necessarily
represent the views and opinions of AlphaSimplex Group, MIT, any of their affiliates and employees, or any of
the individuals acknowledged below. We thank Tanya Giovacchini, Frank Moss, Deb Roy, and participants of
the Media Lab’s Center for Future Banking Seminar for helpful comments and discussion. Research support
from the MIT Laboratory for Financial Engineering and the Media Lab’s Center for Future Banking is
gratefully acknowledged.
†
Post Doctoral Associate, MIT Sloan School of Management and Laboratory for Financial Engineering.
‡
Post Doctoral Associate, MIT Sloan School of Management and Laboratory for Financial Engineering.
§
Harris & Harris Group Professor, MIT Sloan School of Management; director, Laboratory for Financial
Engineering.
Contents
1 Introduction 1
2 The Data 3
2.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Data Trends from 2005 to 2008 . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Constructing Feature Vectors 12
3.1 High Balance-to-Income Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Negative Income Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Modeling Methodology 16
4.1 Machine-Learning Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Model Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Comparison to CScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Applications 34
5.1 Credit-Line Risk Management . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Macroprudential Risk Management . . . . . . . . . . . . . . . . . . . . . . . 44
6 Conclusion 47
A Appendix 48
A.1 Definitions of Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 48
A.2 Framework for Calculating Value Added . . . . . . . . . . . . . . . . . . . . 49
References 52
1 Introduction
One of the most important drivers of macroeconomic conditions and systemic risk is consumer
spending, which accounted for over two thirds of U.S. gross domestic product as of October
2008. With $13.63 trillion of consumer credit outstanding as of the fourth quarter of 2008
($10.47 trillion in mortgages $2.59 trillion in other consumer debt), the opportunities and risk
exposures in consumer lending are equally outsized.
1
For example, as a result of the recent
financial crisis, the overall charge-off rate in all revolving consumer credit across all U.S.
lending institutions reached 10.1% in the third quarter of 2009, far exceeding the average
charge-off rate of 4.72% during 2003 to 2007.
2
With a total of $874 billion of revolving
consumer credit outstanding in the U.S. economy as of November 2009,
3
and with 46.1%
of all families carrying a positive credit-card balance in 2007,
4
the potential for further
systemic dislocation in this sector has made the economic behavior of consumers a topic of
vital national interest.
The large number of decisions involved in the consumer lending business makes it neces-
sary to rely on models and algorithms rather than human discretion, and to base such algo-
rithmic decisions on “hard” information, e.g., characteristics contained in consumer credit
files collected by credit bureau agencies. Models are typically used to generate numerical
“scores” that summarize the creditworthiness of consumers.
5
In addition, it is common for
lending institutions and credit bureaus to create their own customized risk models based on
private information about borrowers. The type of private information typically consist of
both “within-account” as well as “across-account” data regarding customers’ past behavior.
6
However, while such models are generally able to produce reasonably accurate ordinal mea-
1
U.S. Federal Reserve Flow of Funds data, June 11, 2009 release.
2
Data available from the Federal Reserve Board at http://www.federalreserve.gov/releases/chargeoff/.
3
See the latest release of Consumer Credit Report published by the Federal Reserve Board, available at
http://www.federalreserve.gov/releases/g19/Current/
4
See the Survey of Consumer Finances, 2009 (SCF), released in February 2009 and available at
http://www.federalreserve.gov/pubs/bulletin/2009/pdf/scf09.pdf. This report shows that the me-
dian balance for those carrying a non-zero balance was $3,000, while the mean was $7,300. These values
have risen 25% and 30.4%, respectively, since the previous version of the SCF conducted three year earlier.
The SCF also reports that the median balance has risen strongly for most demographic groups, particularly
for higher-income groups.
5
See Hand and Henley (1997) and Thomas (2009) for reviews of traditional and more recent statistical
modeling approaches to credit scoring.
6
The impact of such relationship information in facilitating banking engagement has been studied exten-
sively in the area of corporate and small business lending (see, for example, Boot 2000) and, more recently,
in consumer lending (see Agarwal, Chomsisengphet, Liu and Souleles, 2009).
1
sures, i.e., rankings, of consumer creditworthiness, these measures adjust only slowly over
time and are relatively insensitive to changes in market conditions. Given the apparent speed
with which consumer credit can deteriorate, there is a clear need for more timely cardinal
measures of credit risk by banks and regulators.
In this paper, we propose a cardinal measure of consumer credit risk that combines tra-
ditional credit factors such as debt-to-income ratios with consumer banking transactions,
which greatly enhances the predictive power of our model. Using a proprietary dataset from
a major commercial bank (which we shall refer to as the “Bank” throughout this paper
to preserve confidentiality) from January 2005 to April 2009, we show that conditioning
on certain changes in a consumer’s bank-account activity can lead to considerably more
accurate forecasts of credit-card delinquencies in the future. For example, in our sample,
the unconditional probability of customers falling 90-days-or-more delinquent on their pay-
ments over any given 6-month period is 5.3%, but customers experiencing a recent decline
in income—as measured by sharp drops in direct deposits—have a 10.8% probability of 90-
days-or-more delinquency over the subsequent 6 months. Such conditioning variables are
statistically reliable throughout the entire sample period, and our approach is able to gen-
erate many variants of these transactions-based predictors and combine them in nonlinear
ways with credit-bureau data to yield even more powerful forecasts. By analyzing patterns in
consumer expenditures, savings, and debt payments, we are able to identify subtle nonlinear
relationships that are difficult to detect in these massive datasets using standard consumer
credit-default models such as logit, discriminant analysis, or credit scores.
We use an approach known as “machine learning” in the computer science literature,
which refers to a set of algorithms specifically designed to tackle computationally intensive
pattern-recognition problems in extremely large datasets. These techniques include radial
basis functions, tree-based classifiers, and support-vector machines, and are ideally suited
for consumer credit-risk analytics because of the large sample sizes and the complexity of
the possible relationships among consumer transactions and characteristics.
7
The extraor-
dinary speed-up in computing in recent years, coupled with significant theoretical advances
in machine-learning algorithms, have created a renaissance in computational modeling, of
which our consumer credit-risk model is just one of many recent examples.
7
See, for example, Li, Shiue, and Huang (2006) and Bellotti and Crook (2009) for applications of machine
learning based model to consumer credit.
2
One measure of the forecast power of our approach is to compare the machine-learning
model’s forecasted scores of those customers who eventually default during the forecast period
with the forecasted scores of those who do not. Significant differences between the forecasts
of the two populations is an indication that the forecasts have genuine discriminating power.
Over the sample period from May 2008 to April 2009, the average forecasted score among
individuals who do become 90-days-or-more delinquent during the 6-month forecast period
is 61.9, while the average score across all customers is 2.1. The practical value-added of
such forecasts can be estimated by summing the cost savings from credit reductions to high-
risk borrowers and the lost revenues from “false positives”, and under a conservative set of
assumptions, we estimate the potential net benefits of these forecasts to be 6% to 25% of
total losses.
More importantly, by aggregating individual forecasts, it is possible to construct a mea-
sure of systemic risk in the consumer-lending sector, which accounts for one of the largest
components of U.S. economic activity. We show that the time-series properties of our
machine-learning forecasts are highly correlated with realized delinquency rates (linear re-
gression R
2
’s of 85%), implying that a considerable portion of the consumer credit cycle can
be forecasted 6 to 12 months in advance. This has obvious implications for macroprudential
risk management.
In Section 2, we describe our dataset, discuss the security issues surrounding it, and
document some simple but profound empirical trends. Section 3 outlines out approach to
constructing useful variables or feature vectors that will serve as inputs to the machine-
learning algorithms we employ. In Section 4, we describe the machine-learning framework
for combining multiple predictors to create more powerful forecast models, and present our
empirical results. Using these results, we provide two applications in Section 5, one involving
model-based credit-line reductions and the other focusing on systemic risk measures. We
conclude in Section 6.
2 The Data
In this study, we use a unique dataset consisting of transaction-level, credit-bureau, and
account-balance data for individual consumers. This data is obtained for a subset of the
Bank’s customer base for the period from January 2005 to April 2009. Integrating trans-
action, credit-bureau, and account-balance data allows us to compute and update measures
3
剩余54页未读,继续阅读
资源评论
zzbghost
- 粉丝: 12
- 资源: 23
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功