13
International Journal of Forecasting 28 (2012) 183–195 Contents lists available at SciVerse ScienceDirect International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data Mindy Leow a,, Christophe Mues b,1 a University of Edinburgh, Business School, 29 Buccleuch Place, Edinburgh EH8 9JS, United Kingdom b University of Southampton, School of Management, University Road, Southampton SO17 1BJ, United Kingdom article info Keywords: Regression Finance Credit risk modelling Mortgage loans Loss distributions Basel II abstract With the implementation of the Basel II regulatory framework, it became increasingly important for financial institutions to develop accurate loss models. This work investigates the loss given default (LGD) of mortgage loans using a large set of recovery data of residential mortgage defaults from a major UK bank. A Probability of Repossession Model and a Haircut Model are developed and then combined to give an expected loss percentage. We find that the Probability of Repossession Model should consist of more than just the commonly used loan-to-value ratio, and that the estimation of LGD benefits from the Haircut Model, which predicts the discount which the sale price of a repossessed property may undergo. This two-stage LGD model is shown to perform better than a single-stage LGD model (which models LGD directly from loan and collateral characteristics), as it achieves a better R 2 value and matches the distribution of the observed LGD more accurately. © 2011 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. 1. Introduction Following the introduction of the Basel II Accord, financial institutions are now required to hold a minimum amount of capital based on their estimated exposure to credit risk, market risk and operational risk. According to Pillar 1 of the new Basel II capital framework, the minimum capital required by financial institutions to account for their exposure to credit risk can be calculated using one of two approaches, either the Standardized Approach or the Internal Ratings Based (IRB) Approach. The IRB approach is further split into two and can be implemented using either the Foundation IRB Approach or the Advanced IRB Approach. Under the Advanced IRB Approach, financial institutions are required to develop their own models for the estimation of three credit risk components, viz. Probability of Default (PD), Exposure at Default (EAD) and Loss Given Default (LGD), for each section of their credit Corresponding author. Tel.: +44 131 650 9850; fax: +44 131 651 3197. E-mail address: [email protected] (M. Leow). 1 Tel.: +44 23 8059 2561; fax: +44 23 8059 3844. risk portfolios. The portfolios of a financial institution can be broadly divided into the retail sector, consisting of consumer loans like credit cards, personal loans or residential mortgage loans, and the wholesale sector, which includes corporate exposures such as commercial and industrial loans. The work here pertains to residential mortgage loans. In the United Kingdom, as in the US, the local Basel II regulation specifies that a mortgage loan exposure is in default if the debtor has missed payments for 180 consecutive days (Financial Services Authority, 2009 (FSA), BIPRU 4.3.56 and 4.6.20; Federal Register, 2007). When a loan goes into default, financial institutions could (1) contact the debtor for a re-evaluation of the loan, whereby the debtor would have to pay a slightly higher interest rate on the remaining loan but would have lower and more manageable monthly repayment amounts; or (2) decide to sell the loan to a separate company which works specifically towards the collection of repayments from defaulted loans; or (3) repossess the property (i.e. enter foreclosure) and sell it to cover losses, since every mortgage loan has a physical security (also known 0169-2070/$ – see front matter © 2011 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ijforecast.2011.01.010

Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

Embed Size (px)

Citation preview

Page 1: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

International Journal of Forecasting 28 (2012) 183–195

Contents lists available at SciVerse ScienceDirect

International Journal of Forecasting

journal homepage: www.elsevier.com/locate/ijforecast

Predicting loss given default (LGD) for residential mortgage loans: Atwo-stage model and empirical evidence for UK bank dataMindy Leow a,∗, Christophe Mues b,1

a University of Edinburgh, Business School, 29 Buccleuch Place, Edinburgh EH8 9JS, United Kingdomb University of Southampton, School of Management, University Road, Southampton SO17 1BJ, United Kingdom

a r t i c l e i n f o

Keywords:RegressionFinanceCredit risk modellingMortgage loansLoss distributionsBasel II

a b s t r a c t

With the implementation of the Basel II regulatory framework, it became increasinglyimportant for financial institutions to develop accurate loss models. This work investigatesthe loss given default (LGD) of mortgage loans using a large set of recovery data ofresidential mortgage defaults from a major UK bank. A Probability of Repossession Modeland a HaircutModel are developed and then combined to give an expected loss percentage.We find that the Probability of Repossession Model should consist of more than just thecommonly used loan-to-value ratio, and that the estimation of LGD benefits from theHaircut Model, which predicts the discount which the sale price of a repossessed propertymayundergo. This two-stage LGDmodel is shown toperformbetter than a single-stage LGDmodel (which models LGD directly from loan and collateral characteristics), as it achievesa better R2 value and matches the distribution of the observed LGD more accurately.© 2011 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

s. P

1. Introduction

Following the introduction of the Basel II Accord,financial institutions are now required to hold a minimumamount of capital based on their estimated exposure tocredit risk, market risk and operational risk. According toPillar 1 of the newBasel II capital framework, theminimumcapital required by financial institutions to account fortheir exposure to credit risk can be calculated using one oftwo approaches, either the Standardized Approach or theInternal Ratings Based (IRB) Approach. The IRB approachis further split into two and can be implemented usingeither the Foundation IRB Approach or the Advanced IRBApproach. Under the Advanced IRB Approach, financialinstitutions are required to develop their own modelsfor the estimation of three credit risk components, viz.Probability of Default (PD), Exposure at Default (EAD) andLoss Given Default (LGD), for each section of their credit

∗ Corresponding author. Tel.: +44 131 650 9850; fax: +44 131 651 3197.E-mail address:[email protected] (M. Leow).

1 Tel.: +44 23 8059 2561; fax: +44 23 8059 3844.

0169-2070/$ – see front matter© 2011 International Institute of Forecasterdoi:10.1016/j.ijforecast.2011.01.010

risk portfolios. The portfolios of a financial institutioncan be broadly divided into the retail sector, consistingof consumer loans like credit cards, personal loans orresidential mortgage loans, and the wholesale sector,which includes corporate exposures such as commercialand industrial loans. The work here pertains to residentialmortgage loans.

In the United Kingdom, as in the US, the local BaselII regulation specifies that a mortgage loan exposureis in default if the debtor has missed payments for180 consecutive days (Financial Services Authority, 2009(FSA), BIPRU 4.3.56 and 4.6.20; Federal Register, 2007).When a loan goes into default, financial institutions could(1) contact the debtor for a re-evaluation of the loan,whereby the debtor would have to pay a slightly higherinterest rate on the remaining loan but would have lowerand more manageable monthly repayment amounts; or(2) decide to sell the loan to a separate company whichworks specifically towards the collection of repaymentsfrom defaulted loans; or (3) repossess the property(i.e. enter foreclosure) and sell it to cover losses, sinceevery mortgage loan has a physical security (also known

ublished by Elsevier B.V. All rights reserved.

Page 2: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

184 M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195

as collateral), i.e. a house or flat. In this last case, thereare two possible outcomes: either the sale of the propertyis able to cover the value of the loan outstanding andthe associated repossession costs, with any excess beingreturned to the customer, resulting in a zero loss rate, orthe sale proceeds are less than the outstanding balanceand costs and there is a loss. Note that the distributionof LGD in the event of repossession is thus capped at oneend. The aim of LGDmodelling in the context of residentialmortgage lending is to estimate this loss accurately as aproportion of the outstanding loan, if the loan were to gointo default. In this paper, wewill empirically investigate atwo-stage approach for estimating the mortgage LGD on aset of recovery data relating to residential mortgages froma major UK bank.

The rest of this paper is structured as follows. Sec-tion 2 consists of a literature review and discusses somemortgage LGD models currently in use in the UK, followedby Section 3 which lists our research objectives. In Sec-tion 4, we describe the available data, as well as the pre-processing applied to it. In Sections 5 and 6, we detail theProbability of Repossession Model and the Haircut Model,respectively. Section 7 explains how the component mod-els are combined to form the LGD model. Finally, in Sec-tion 8 we look at some possible further extensions of thiswork and conclude.

2. Literature review

Much of the work in the literature on the predictionof LGD, and to some extent PD, pertains to the corporatesector (see Altman, Brady, Resti, & Sironi, 2005; Gupton& Stein, 2002; Jarrow, 2001; Schuermann, 2004; Truck,Harpaintner, & Rachev, 2005), which can be explainedpartly by the greater availability of (public) data and partlyby the fact that the financial health or status of the debtorcompanies can be inferred directly from the share andbond prices traded on the market. However, this is not thecase in the retail sector, which partly explainswhy the LGDmodels for the retail sector are not as developed as thosepertaining to corporate loans.

2.1. Risk models for residential mortgage lending

Despite the lack of publicly available data, particularlyfor individual loans, there are still a number of interestingstudies on credit risk models for mortgage lending thatuse in-house data from lenders. However, in the past,the majority of these have focused on the prediction ofdefault risk, as is comprehensively detailed by Quercia andStegman (1992). One of the earliest papers on mortgagedefault risk was by von Furstenberg (1969), who foundthat various characteristics of a mortgage loan can beused to predict whether default will occur. These includethe loan-to-value ratio (i.e. the ratio of the loan amountto the value of the property) at origination, term ofmortgage, and age and income of the debtor. Followingthat, Campbell and Dietrich (1983) further expanded onthe analysis by investigating the impact of macroeconomicvariables on themortgage default risk. They found that theloan-to-value ratio is indeed a significant factor, and that

the economy, especially local unemployment rates, doesaffect default rates. This was confirmed more recently byCalem and LaCour-Little (2004), who looked at estimatingboth the default probability and the recovery rate (whererecovery rate = 1 − LGD) on defaulted loans from theOffice of Federal Housing Enterprise Oversight (OFHEO).They estimated the recovery rate by employing a splineregression to accommodate the non-linear relationshipsthat were observed between the two loan-to-value ratios(LTV at loan start and LTV at default) and the recovery rate,which achieved an R2 of 0.25.

Similarly to Calem and LaCour-Little (2004), Qi andYang (2009) also modelled the loss directly using variouscharacteristics of defaulted loans, using data from privatemortgage insurance companies, and in particular onaccounts with high loan-to-value ratios that have goneinto default. In their analysis, they were able to achievehigh values of R2 (around 0.6), which could be attributableto their being able to re-value the properties at the timeof default (expert-based information which would notnormally be available to lenders on all loans, and hence onewould not be able to use it in the context of Basel II, whichrequires the estimation of LGD models that can be appliedto all loans, not just defaulted loans).

2.2. Single vs. two-stage LGD models

While the former models estimate LGD directly andwill thus be referred to as ‘‘single-stage’’ models, the ideaof a so-called ‘‘two-stage’’ model is to incorporate twocomponent models, the Probability of Repossession Modeland the Haircut Model, into the LGD modelling. Initially,the Probability of Repossession Model is used to predictthe likelihood of a defaultedmortgage account undergoingrepossession. It is sometimes thought that the probabilityof repossession is dependent mainly on one variable, viz.the loan-to-value ratio, and hence some probability ofrepossession models which are currently in use consistonly of this single variable. This is then followed by asecond model which estimates the discount which thesale price of the repossessed property would undergo. TheHaircut Model predicts the difference between the forcedsale price and the market valuation of the repossessedproperty. These two models are then combined to get anestimate of the loss, given that a mortgage loan goes intodefault. An example study involving the two-stage modelis that of Somers and Whittaker (2007), who, althoughthey did not detail the development of their Probability ofRepossession Model, acknowledged the methodology forthe estimation of the mortgage loan LGD. In their paper,they focus on the consistent discount (haircut) in saleprices observed in the case of repossessed properties, andbecause they observe a non-normal distribution of thishaircut, they propose the use of quantile regression in theestimation of the predicted haircut. Another paper whichinvestigates the variability that the value of collateralundergoes is by Jokivuolle and Peura (2003). Althoughtheir workwas on default and recovery on corporate loans,they highlight the correlation between the value of thecollateral and recovery.

Page 3: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195 185

In summary, despite the increased importance of LGDmodels in consumer lending and the need to estimateresidential mortgage loan default losses at the individualloan level, relatively fewpapers have beenpublished in thisarea to date, apart from the ones mentioned above.

3. Research objectives

From the literature review, we observe that the fewpapers which looked at mortgage loss either did soby modelling LGD directly (‘‘single-stage’’ models) usingeconomic variables and characteristics of loans whichwere in default or did not look at both components ofa two-stage model, i.e. haircut as well as repossession.This might be due to their analyses being carried out onsamples of loans which had already undergone defaultand subsequent repossession, and thus removed the needfor differentiating between accounts that would undergorepossession and those that would not. We also notethat there was little consideration of possible correlationsbetween explanatory variables.

Hence, the two main objectives of this paper areas follows. Firstly, we evaluate the added value of aProbability of Repossession Model with more than onevariable (the loan-to-value ratio). Secondly, using real-life UK bank data, we empirically validate the approachof using two component models, the Probability ofRepossession Model and the Haircut Model, to createa model that produces estimates of LGD. We developthe two component models separately before combiningthem by weighting conditional loss estimates against theirestimated outcome probabilities.

4. Data

The dataset used in this study was supplied by a majorUK bank, with observations coming from all parts of theUK, including Scotland, Wales and Northern Ireland. Thereare more than 140,000 observations and 93 variables inthe original dataset, all of which are on defaultedmortgageloans, with each account being identified by a uniqueaccount number. About 35% of the accounts in the datasetundergo repossession, and the time between default andrepossession varies from a couple of months to severalyears. After pre-processing (see Section 5), we retain about120,000 observations, with accounts that start betweenthe years 1983 and 2001 (note that loans predating 1983were removed because of the unavailability of house priceindex data for these older loans) and default between theyears of 1988 and 2001, with at least a two year outcomewindow (for repossession to happen, if any). Note thatthis sample does not include observations from the recenteconomic downturn.

Under the Basel II framework, financial institutionsare required to forecast both default over a 12-monthhorizon and the resulting losses at a given time (referredto here as the ‘‘observation time’’). As such, any LGDmodels developed should not contain information whichis only available at the time of default. However, due tolimitations in the dataset, in which information on thestate of the account in the months leading up to default

(e.g. the outstanding balance at the observation time) isnot available, we use the approximate default time insteadof the observation time. When applying this model ata given time point, a forward-looking adjustment couldthen be applied to convert the current value of a variable,for example the outstanding balance, to an estimate atthe time of default. Default-time variables for which noreasonable projection is available are removed.

4.1. Multiple defaults

Some accounts have repeated observations, whichmeans that some customers were oscillating betweenkeeping up with their normal repayments and goinginto default. Thus, each default is recorded as a separateobservation of the characteristics of the loan at that time.Because the UK Basel II regulations state that the financialinstitution should return an exposure to non-default statusin the case of recovery, and record another default shouldthe same exposure subsequently go into default again (TheFinancial Services Authority (FSA), 2009, BIPRU 4.3.71),we include all instances of default in our analysis, andrecord each default that is not the final instance ofdefault as having a zero LGD (in the absence of furthercost information). We note that other approaches fordealing with repeated defaults could also be considered,depending on the local regulatory guidelines.

4.2. Time on book

The time on book is calculated to be the time betweenthe start date of the loan and the approximate date ofdefault.2 The variable time on book exhibits an obviousincreasing trend over time (see Fig. 13), which mightbe due partly to the composition of the dataset. In thedataset, we have defaults between the years 1988 and2001, which just about coincides with the start of the UKeconomic downturn of the early nineties. We observe thatthemean time on book for observations that default duringthe economic downturn is significantly lower than themean time on book for observations that default in normaleconomic times.

4.3. Valuation of security at default and haircut

Information about the market value of the property isobtained at the time of the loan application. As reassessingits value would be a costly exercise, no new marketvalue assessment tends to be undertaken thereafter, and avaluation of the property at various points of the loan canbe obtained byupdating the initial property value using thepublicly availableHalifaxHouse Price Index4 (all houses, all

2 The date of default was estimated by the bank using the arrears statusand the amount of cumulated arrears at the end of each year for eachaccount, because we are not given an explicit default date in the originaldataset.3 Due to a data confidentiality agreement with the data provider, the

scale of the y-axis has been omitted in some of the figures.4 Available from: http://www.lloydsbankinggroup.com/media1/resea

rch/halifax_hpi.asp.

Page 4: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

186 M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195

Mean

Ti

me on

Book

Mean Time on book according to Year of Default for all observations in default

Year of Default

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001

Fig. 1. Mean time onbook over time,with reference to the year of default.

buyers, non-seasonally adjusted, quarterly, regional). Thevaluation of security at default is calculated according toEq. (1):

Valuation of securitydefault =HPIdef yr, def qtr, regionHPIstart yr, start qtr, region

×Valuation of securitystart. (1)

Using this valuation of security at default, various othervariables are then updated. One such variable is thevaluation of the property as a proportion of the averageproperty value in the region, which gives an indicationof the quality of the property relative to other propertiesin the same area; another is the LTV at default (DLTV),which is the ratio of the outstanding loan at default tothe valuation of the security at default; yet another is thehaircut,5 which we define as the ratio of the forced saleprice to the valuation of property at the default quarter(only for observations with a valid forced sale price). Forexample, a property which is estimated to have a marketvalue of £1,000,000 but which was repossessed and soldfor £700,000 would have a haircut of £700,000

£1000,000 = 0.7.

4.4. Training and test set splits

To obtain unbiased performance estimates of themodelperformance, we set aside an independent test dataset.Wedevelop each componentmodel using a training set, beforeapplying the models to a separate test set which was notinvolved in the development of the model itself, to gaugethe performance of themodel and to ensure that there is noover-fitting. To do so,we split the cleaned dataset into two-third and one-third sub-samples, keeping the proportion

5 A more common definition of haircut is the complement, 1 −sale price

valuation of security at default . However, in this paper, we will use the term‘‘haircut’’ to refer to the ratio, rather than its complement, in order tofacilitate the interpretation of the parameter signs and further notation.We note that which of these two definitions is used does not affect theactual modelling, but will make a difference for the interpretation of thecoefficient signs.

of repossessions the same in the two sets (i.e. stratified byrepossession cases). These are then used as the respectivetraining and test sets for the Probability of RepossessionModel. However, since a haircut can only be calculated inthe event of repossession and sale, all non-repossessionswill subsequently be removed from the training and testsamples for the second Haircut model component.

4.5. Loss given default

When a loan goes into default and the property issubsequently repossessed by the bank and sold, legal,administrative and holding costs are incurred. As thisprocess might take a couple of years to complete, revenuesand costs have to be discounted to their present valuein the calculation of Loss Given Default (LGD), andshould include any compounded interest incurred on theoutstanding balance of the loan. However, in our analysis,we simplify our definition of LGD to exclude both theextra costs incurred and the interest lost, because we arenot provided with any information about the legal andadministrative costs associated with each loan default andrepossession.

Hence, LGD is defined as the final (nominal) loss fromthe defaulted loan as a proportion of the outstanding loanbalance at (year end of) default, where the loss is definedas the difference between the loan outstanding at defaultand the forced sale amount, if the property was sold at aprice that is lower than the amount of the loan outstandingat default (i.e. outstanding loan at default > forced saleamount). If the property was able to fetch an amountgreater than or equal to that of the outstanding loan atdefault, then the loss is defined to be zero. The loss is alsoassumed to be zero if the property was not repossessed, orrepossessed but not sold, in the absence of any additionalinformation.With the loss defined as zero, LGD is of coursezero as well.

5. The probability of repossession model

Our first model component will provide us with anestimate of the probability of repossession, given that aloan goes into default.

5.1. Modelling methodology

We first identify a set of variables that is eligiblefor inclusion in the repossession model. Variables thatcannot beused are removed, including thosewhich containinformationwhich is only known at the time of default andfor which no reasonably precise estimate can be producedbased on their value at observation time (e.g. arrears atdefault), as well as those which have too many missingvalues or are related to housing or insurance schemesthat are no longer relevant, together with those wherethe computation is simply not known. We also then checkthe correlation coefficient between the pairs of remainingvariables, and find that none is greater than |0.6|. Usingthese, a logistic regression is then fitted to the repossessiontraining set and a backward selectionmethod based on theWald test is used to keeponly themost significant variables

Page 5: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195 187

(p-value of at most 0.01). We then check that the sign ofeach parameter estimate behaves logically, and that theparameter estimates of groupswithin categorical variablesdo not contradict intuition.

5.2. Model variations

Using the methodology above, we obtain a Probabilityof Repossession Model R1, with four significant variables,namely the loan-to-value (LTV) ratio at the time of theloan application (start of loan), a binary indicator forwhether this account has previously had a default, timeon book in years, and type of security, i.e. detached, semi-detached, terraced, flat or other. In a second model, wereplace the LTV at loan application and time on bookwith the LTV at default (DLTV); this is referred to asProbability of Repossession Model R2. Including all threevariables (LTV, DLTV and time on books) in a single modelwould cause counter-intuitive parameter estimate signs.Another simpler repossession model fitted on the samedata, against which we will compare our models, is ModelR0. The latter model only has a single explanatory variable,DLTV, which is often themain driver inmodels used by theretail banking industry.

5.3. Performance measures

The performance measures applied here are the accu-racy rate, sensitivity, specificity, and the area under theROC curve (AUC).

In order to assess the accuracy rate (i.e. the total numberof correctly predicted observations as a proportion of thetotal number of observations), sensitivity (i.e. the numberof observations correctly predicted to be events — inthis context: repossessions — as a proportion of the totalnumber of actual events), and specificity (i.e. the numberof observations correctly predicted to be non-events —in this context: non-repossessions — as a proportion ofthe total number of actual non-events) of each logisticregression model, we have to define a cut-off value forwhich only observations with a probability higher thanthe cut-off are predicted to undergo repossession. Thedefinition of the cut-off affects the performance measuresabove, as it affects the number of observations which willbe predicted to be repossessions or non-repossessions. Forour dataset, we choose the cut-off value such that thesample proportions of actual and predicted repossessionsare equal. However, we note that the exact value selectedhere is unimportant in the estimation of LGD itself, as themethod used later to estimate LGD does not require theselection of a cut-off.

The receiver operating characteristic (ROC) curve isa 2-dimensional plot of sensitivity and (1 – specificity)values for all possible cut-off values. It passes through thepoints (0, 0), i.e., all observations are classified as non-events, and (1, 1), i.e., all observations are classified asevents. A straight line through (0, 0) and (1, 1) representsa model that classifies observations randomly as eitherevents or non-events. Thus, the closer the ROC curve is topoint (0, 1), the better the model is in terms of discerningwhich observations belong in which category. As the ROC

Distribution of Haircut for training set where Haircut is not 0

Haircut

Percent

0.025 0.175 0.325 0.475 0.625 0.775 0.925 1.075 1.225 1.375 1.525 1.675 1.825 1.975 2.125

Fig. 2. Distribution of the haircut (the solid curve indicates the normaldistribution).

curve is independent of the cut-off threshold, the areaunder the curve (AUC) gives an unbiased assessment ofthe effectiveness of the model in terms of classifyingobservations.

We also use the DeLong, DeLong, and Clarke-Pearson(1988) test to assess whether there are any significantdifferences between the AUC values of different models.

5.4. Model results

Applying the DeLong, DeLong and Clarke-Pearsontest, we find that the AUC values for model R2 aresignificantly better than those for R0 (see Table 1),whereas R1 performs worse than both. Hence, model R2is selected for inclusion in our two-stage model. Table 2gives the direction of parameter estimates used in theprobability of repossession model R2, together with apossible explanation. The parameter estimate values andp-values of all repossession model variations are given inTables A.1–A.3 of the Appendix.

6. The haircut model

The haircut model is only applicable to observationswhich have undergone the repossession and forced saleprocess, where the haircut is defined as the ratio of theforced sale price to the valuation of security at default.Therefore, securities which were not repossessed, or wererepossessed but not sold, do not have a haircut value, andare thus excluded from the development of the haircutmodel.

An OLS model is also developed to model the haircutstandard deviation explicitly, as a function of the time onbooks, as was suggested by Lucas (2006).

The distribution of the haircut is shown in Fig. 2,with the solid curve showing the normal distribution.Statistics from the Kolmogorov-Smirnov and Anderson-Darling tests (Peng, 2004) suggest non-normality, withp-values of <0.01 and <0.005 respectively, but for thepurposes of the prediction of LGD, we approximate thehaircut by a normal distribution.

Page 6: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

188 M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195

Table 1Repossession model performance statistics.

Model AUC Cut-off Specificity Sensitivity Accuracy

R1, test set (LTV, time on books, security, previous default) 0.727 0.435 57.449 75.688 69.186R2, test set (DLTV, security, previous def) 0.743 0.432 59.398 76.203 70.213R0, test set (DLTV) 0.737 0.436 58.626 76.008 69.812DeLong et al. p-value, R1 vs. R0 <0.001DeLong et al. p-value, R2 vs. R0 <0.001

Table 2Parameter estimate signs for the probability of repossession model R2.

Variable Relationship to probability ofrepossession (given default)

Explanation

DLTV (LTV at default) + If a large proportion of the loan is tied up in security, the likelihood ofrepossession increases

Previous default + The probability of repossession increases if the account has been in default beforeSecurity − Lower-range property types such as flats are more likely to be repossessed in the

case of default

Table 3Haircut model performance statistics.

Model MSE MAE R2

H1, test set 0.039 0.147 0.143H2, test set 0.039 0.148 0.131

6.1. Modelling methodology

The top and bottom 0.05% of observations (26 cases)for the haircut are truncated before we establish the set ofeligible variables to be considered in the development ofan OLS linear regression model for the haircut model. Wealso check the relationship between the variables and thehaircut. In particular, the valuation of security at default tothe average property valuation in the region ratio displaysa high level of non-linearity (see Fig. 3), and is binnedinto 6 groups for model development. Backward stepwiseregression is used to remove insignificant variablesand individual parameter estimate signs are checkedfor intuitiveness. We also check for intuition withincategorical variables, and examine the variance inflationfactors (VIF).6

6.2. Model variations

Using the methodology above, we obtain a haircutmodel H1 with seven significant variables, namely theloan-to-value (LTV) ratio at the time of the loan application(start of loan), a binary indicator for whether this accounthas previously had a default, time on the books in years,ratio of the valuation of the property to the average inthat region (binned), type of security (i.e. detached, semi-detached, terraced, flat or other), age group of propertyand region. In a second model, we replace the LTV at

6 If variables within the model are highly correlated with each other,this would be reflected in a high value of VIF. Any value above 10 impliessevere collinearity amongst variables, while values less than 2 mean thatthe variables are almost independent (Fernandez, 2007).

Mean

Haircut

Relationship between Haircut and ranked valuation/average property in region

for accounts that underwent repossession

Rank in ascending order of grouped valuation/average property in region0 100 200 300 400

Fig. 3. Relationship between the haircut and the (ranked) valuation ofsecurity at default to average property valuation in the region ratio.

loan application and the time on books with the LTV atdefault (DLTV), referred to as haircut model H2; note that,as previously, including all three variables (LTV, DLTV andtime on books) in a single model would cause counter-intuitive parameter estimate signs.

Comparative performancemeasures for the twomodelsare reported in the following subsection.

6.3. Performance measures

The performance measures considered here are theR2 value, the mean squared error (MSE) and the meanabsolute error (MAE). To create a graphical representationof the results, we also present a binned scatterplot of thepredicted haircut value bands against the actual haircutvalues, where the predicted haircut values are put intoascending order and binned into equal-frequency valuebands; the mean actual haircut value is then compared

Page 7: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195 189

Table 4Parameter estimate signs of haircut model H1.

Variable Relation to haircut Explanation

LTV + Refer to Fig. 4 and the explanation in Section 6.4Ratio of the valuation of security atdefault to the average property valuationin that region, binned

+/− Medium-end properties (relative to the region the property is in) have a higherhaircut than lower-end properties, but higher-end properties tend to have thelowest haircut (see Fig. 3 in Section 6.1)

Previous default + The haircut is higher for accounts which have defaulted previouslyTOB (time on books in years) + Older loans imply a greater degree of uncertainty and error in the estimation of

the value of security at default, so a higher haircut is possibleSecurity + The haircut tends to be higher for higher-end property types (e.g. detached)Age group of property (oldest to newest) + The haircut tends to be higher for newer propertiesRegiona N/A The haircut varies across regions

a Since regional differences may not persist over time, one can alternatively choose to omit the geographic dummy variables from the model. Ourrobustness tests indicate that themodel fit is slightly lowerwithout these,while the parameter estimate signs and estimates of the other variables remainedstable.

with the mean predicted haircut value in each haircutband.

6.4. Model results

First, we note that all parameters for all models havelow VIF values, with the only ones above 2 belongingto geographical indicators. In the haircut model, thecombination of LTV and the time on books seems to beable to capture the information carried in DLTV, because,as can be observed from Table 3, model H1 performs best.This could be because LTV gives an indication of the (initial)quality of the customer, whereas the values of DLTV couldbe due to changes in house prices since the purchase of theproperty. Based on this, modelH1 is selected as the haircutmodel to be used in the LGD estimation.

Table 4 gives details of the parameter estimate signs.From it, we can see that a greater LTV at the start impliesa higher haircut (i.e. a higher forced sale price). Thismeans that the larger the original loan a debtor tookrelative to the property value, the higher the forced saleprice of the security would be in the event of a defaultand repossession. At first, it might seem as though thisparameter estimate sign could be confused due to thenumber of variables in the haircut model, or due to somehidden correlation between variables. In order to rule outthis possibility, we look at the relationship between theLTV at the start and the haircut. From Fig. 4, we observethat there does indeed appear to be a positive relationshipbetween the haircut and the LTV. Part of the explanationfor this might be found in the policy decisions made bythe bank. For loans with high loan-to-value ratios, due tothe large amount (relative to the loan) which the bankhas committed to the property, when the account doesgo into default and subsequent repossession, the bankmay be reluctant to let the repossessed property go unlessit is able to fetch a price close to the current propertyvaluation. Another possible reason could be that borrowerswith a low LTV are likely to sell early and only end upin repossession when they know that the house is in abad state and unlikely to make anything near its indexedvaluation.

To further validate the model, we also include a scatterplot of the mean (grouped) predicted and actual haircutin Fig. 5. From it, we observe that our model produces

Mean

Haircut

Relationship between Haircut and Ranked LTV for accounts that underwent repossession

Rank in ascending order of grouped LTV0 100 200 300 400

Fig. 4. Relationship between the haircut and the (ranked) LTV at the timeof the loan application.

unbiased estimates of the haircut. Parameter estimatesof all models can be found in Tables A.4 and A.5 of theAppendix.

6.5. Haircut standard deviation modelling

To be able to produce an expected value of the LGD (seeSection 7.1), we will require not only a point estimate ofthe haircut, but also a model component for the haircutvariability. A further inspection reveals that the standarddeviation of the haircut increases with a longer time onthe books (see Fig. 6), which is to be expected, because thevaluation of a property is usually updated using publiclyavailable house price indices (instead of commissioning anewvaluation process), and the longer an account has beenon the books, the greater the uncertainty and error in theestimation of the current valuation of the property, whichwill affect the error in the prediction of the haircut as well.

As was suggested by Lucas (2006), for modelling thisrelationship, a simple OLS model was fitted that estimatesthe standard deviation for different bins of time on books.7

7 Alternatively, because the standard deviation of the haircut isdifferent for different groups of observations, the weighted least squares

Page 8: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

190 M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195

Prediction performance in terms of haircut bands (test set) banded on the predicted haircut

Mean

Haircut

0 10 20 30 40 50Rank in ascending order of grouped predicted Haircut

60 70 80 90

PLOT mean predicted haircut mean actual haircut

Fig. 5. Prediction performance of the haircut test set.

Mean haircut standard deviation by time on book bins

Time on book bins

Mean

Haircut standard deviation 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 6. Mean haircut standard deviation by bins of the time on the books.

The time on books is binned into 6 month intervals, andthe standard deviation of the haircut is calculated foreach group based on the mean haircut in that group. Thismodel will be used later to calculate the expected valuesof LGD (see Section 7.1). The performance statistics forthis auxiliary model are given in Table 5, and parameterestimates can be found in Table A.6 of the Appendix.

regression method was considered to adjust for the heteroscedascity inthe OLS model developed in Section 6.4. Two different weights wereexperimented with: the error term variance of each observation (fromrunning anOLSmodel for the haircut) and the time on books. Bothmodelsproduced parameter estimates similar to those of the selected haircutmodel, which suggests that the OLS model was able to produce robustparameter estimates even though the homoscedasticity assumption wasviolated. Also, because the models did not explicitly model and producethe standard deviation of the haircut, which is required in the calculationof the expected LGD, a separate OLS model for the standard deviation isnecessary.

Table 5Haircut standard deviation model performance statistics.

Model MSE MAE R2

Training set 0.0001 0.0046 0.9315Test set 0.0002 0.0105 0.8304

7. Loss given default model

Having estimated theprobability of repossessionmodel,the haircut model and the haircut standard deviationmodel, we now combine these models to get an estimateof the loss given default. Here we illustrate two ways ofcombining the component models, report their respectiveLGD predictions, and advocate the use of the more conser-vative approach, producing an expected value of LGD thattakes into account the haircut variability.We also comparethese results with the single-stage model predictions andperformance statistics.

7.1. Modelling methodology

A first approach, referred to in our paper as the‘‘haircut point estimate’’ approach, would be to keep theprobabilities derived under the probability of repossessionmodel and apply the haircut model to each observation.This would give all observations a predicted haircut valuein the event of repossession. Using this predicted valueof the haircut, the predicted sale price and predictedloss (outstanding balance at default less sale proceeds),if any, can be calculated. We then find the predictedLGD by multiplying the probability of repossession bythis predicted loss if repossession happens. Although thismethod does produce an estimate of LGD, regardless ofwhether the observation is predicted to enter repossessionor not, it uses only a single value of the haircut (although itis the most probable value). However, if the true haircuthappens to be lower than predicted, the sale proceedswould be overestimated, which would mean that a losscould still be incurred (provided that thehaircut falls belowDLTV). This is an illustration of the fact thatmisleading LGDpredictions can be produced if the component models arenot combined appropriately. Hence, in order to producea true expected value for LGD, one should also take intoaccount the distribution of the haircut estimate and theassociated effect on the loss in its left tail.

Hence, the second and more conservative approach,suggested by Lucas (2006), for example, and referredto here as the ‘‘expected shortfall’’ approach, also takesinto account the probabilities of other values of thehaircut occurring, and the different levels of loss associatedwith these different levels. To do this, we first use theprobability of repossession model to obtain an estimateof the probability of repossession, given that an accountgoes into default. We then apply the haircut model tothe same dataset to get an estimate of the haircut, Hj,for each observation j, regardless of whether the securityis likely to be repossessed. A minimum value of zero isset for the predicted haircut, as a negative haircut has nomeaning. The haircut standard deviation model is thenapplied to each observation j to get a predicted haircut

Page 9: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195 191

Comparative histogram for test sets for different ways of model implementation

Count

Empirical

HC

Pt.est

E.shortfall

Single stage

Count

Count

Count

Model

Variation

0.01 0.05 0.09 0.13 0.17 0.21 0.25 0.29 0.33 0.37 0.41 0.45 0.49

LGD

0.53 0.57 0.61 0.65 0.69 0.73 0.77 0.81 0.85 0.89 0.93 0.97

MeanStd Dev

0.1030090.184756

MeanStd Dev

0.0968240.107749

MeanStd Dev

0.1056260.102394

MeanStd Dev

01068560.080621

Fig. 7. Distributions of the observed LGD (empirical), predicted LGD from the two-stage haircut point estimate model (HC pt. est.), two-stage expectedshortfall model (E.shortfall), and single-stage model (single stage) (from top to bottom).

standard deviation, σj, depending on its time on books (seeSection 6.5). From these predicted values, we approximatethe distribution of each predicted haircut by a normaldistribution, hj ∼ N(Hj, σ

2j ).

For the sake of simplicity, the subscript j, whichrepresents individual observations, will be dropped fromhere on.

As long as the haircut (ratio of the sale amount tothe valuation of the property at default) is greater thanDLTV (ratio of the outstanding balance at default tothe valuation of the property at default), and ignoringany additional administrative and repossession-associatedcosts, the proceeds from the sale will be able to coverthe outstanding balance on the loan, i.e. there will beno shortfall. Hence, the expected shortfall, expressed as aproportion of the indexed valuation of the property, is:

E(shortfall percent | repossession)

=

∫ DLTV

−∞

p(h)(DLTV − h)dh, (2)

where p(.) denotes the probability density function of thedistribution for h.

To convert the latter into a standard normal distribu-tion, we let:

z =h − H

σ∼ N(0, 1); D =

DLTV − Hσ

.

Hence, the expected shortfall can easily be derived asfollows:

E(shortfall percent | repossession)

=

∫ D

−∞

p(z)(D − z)σdz

=

σD

∫ D

−∞

p(z)dz

σ

∫ D

−∞

p(z)zdz

= σDCDFZ(D) − σ(−PDFZ(D)), (3)

Table 6Performance measures of two-stage and single-stage LGD models.

Method, dataset MSE MAE R2

Single stage, test set 0.026 0.121 0.233Two-stage (haircut point estimate), test set 0.025 0.108 0.268Two-stage (expected shortfall), test set 0.025 0.101 0.266

where CDFZ(D) and PDFZ(D) denote the cumulativedistribution function and probability density function ofthe standard normal distribution, respectively.

The expected loss given default is then obtained fromthe probability of non-repossession and the expectedshortfall calculated for the repossession scenario (seeEq. (4), below). The probability of an account undergoingrepossession given that it has gone into default ismultiplied by the expected LGD the accountwould incur inthe event of repossession.We alsomultiply the probabilityof an account not going into repossession by the expectedLGD for non-repossessions (denoted by c). We can use theaverage observed LGD for actual non-repossessions as theexpected conditional LGD for non-repossessions.

E(loss | default)= [E(shortfall percent | repossession)

× indexed valuation × P(repossession | default)]+ [c × (1 − P(repossession | default))], (4)

where c is the loss associated with non-repossessions(assumed to be 0 in the absence of additional information).

Finally, we obtain the predicted LGD by taking the ratioE(loss | default) to the (estimated) outstanding balanceat default.

7.2. Alternative single-stage model

To be able to compare this two-stage model, we alsodeveloped a simple single-stage model using the same

Page 10: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

192 M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195

Prediction performance in terms of LGD bands: graph of mean predicted LGD against mean actual LGD

Predicted LGD

Actual LGD

PLOT Expected shortfall methodSingle stage method

Haircut point estimate methodreference line

Fig. 8. Scatterplot of the predicted and actual LGD in LGD bands.

data. A backward stepwise selection on the set of eligiblevariables used earlier in the two-stage model building wasapplied, and the resulting model parameter estimates aregiven in TableA.7 of theAppendix. However, it is noted thatwhatever the results of the single-stage model, becauseit predicts LGD directly based on loan and collateralcharacteristics, it does not provide the same insights intothe two different drivers (i.e. repossession risk and saleprice haircut) of mortgage loss, and as such does notprovide as rich a framework for stress testing.

The performance measures of this single-stage modelare then compared with those of the preferred two-stagemodel developed in the previous section (i.e. using theexpected shortfall approach), as well as with the two-stage model that results from the so-called ‘‘haircut pointestimate’’ approach.

7.3. Model performance

Using the same performance measures as were usedfor the haircut model, we compare the MSE, MAE and R2

values of our two-stagemodel and the single-stagemodels(see Table 6). It is observed that both two-stage modelvariations achieve a substantially better R2 of just under0.27 (compared to 0.233 for the single-stagemodel) on theLGD test set, which is competitive to other LGD modelscurrently used in the industry.

The distributions of the predicted and actual LGD for allLGD models are shown in Fig. 7. In the original empiricaldistribution of LGD (see the top section of Fig. 7), thereis a large peak near 0 (where the losses were zero eitherbecause there was no repossession, or because the sale ofthe house was able to cover the remaining loan amount).Firstly, we observe that the single-stage model (shown inthe bottom section of Fig. 7) is unable to reproduce thepeak near 0. Moreover, note that the two-stage modelusing the haircut point estimate appears to reproducethe empirical distribution of LGD most closely, as it isable to bring out the peak near 0. Although the R2 valuesachieved by the two two-stage LGD model variationsare very similar (see Table 6), their LGD distributions

are quite different. The haircut point estimate approachis shown to underestimate the average loss (cf. meanpredicted LGD from the haircut point estimate methodbeing lower than the mean observed LGD). Unlike theformer approach, the expected shortfall method takesa more conservative approach in its estimation of LGD,which takes into account the haircut distribution and itseffect on the expected loss, based on the probabilitiesof different haircut values occurring. This will make adifference, especially for observations which would bepredicted to have a low or zero LGD under the haircutpoint estimate method, because these very accounts arenow assigned at least some expected loss amount, hencemoving observations out of the peak and into the low LGDbins.

To further verify the extent to which these variousmodels are able to produce unbiased estimates at an LGDloan pool level, we create a graphical representation ofthe results. We look at a binned scatterplot of predictedLGD value bands against actual LGD values, where thepredicted LGD values are put into ascending order andbinned into equal-frequency value bands. For eachmethodwe used in the calculation of LGD, we plot the meanactual LGD value against themean predicted LGDvalue (forthat LGD band) onto a single graph, given in Fig. 8. Notethat both of the two-stage models are consistently ableto estimate LGD fairly closely, whereas the single-stagemodel either overestimates LGD (in the lower-left handregion of the graph, which represents observations thathave a low LGD) or underestimates it (in the upper-righthand region of the graph, which represents observationswith a high LGD). Furthermore, the expected shortfallapproach is shown to produce the most reliable estimatesin the lower-LGD regions, outperforming the haircut pointestimate approach in the lower-left part of the graph,where the haircut point estimate approach underestimatesthe risk (i.e. the estimates fall below the diagonal).

Finally, in order to check the robustness of our two-stage LGD model, we have also experimented with re-estimating the two componentmodels, this time includingonly the first instance of default for customers withmultiple defaults (i.e. not all instances of default areincluded for observations withmultiple defaults). Detailedresults are not reported here, but we obtained the sameparameter estimate signs for both of the componentmodels and for the LGD model itself, and the parameterestimates were similar in size.

8. Conclusions and further research

In this paper, we developed and validated a number ofmodels for estimating the LGD of mortgage loans usinga large set of recovery data from residential mortgagedefaults from amajor UK bank. The objectives of this paperwere two-fold. Firstly, we aimed to evaluate the addedvalue of a probability of repossession model with morethan just LTV at default as its explanatory variable. Wehave therefore developed a probability of repossessionmodel with three variables, and showed that it issignificantly better than a model with only the commonlyused DLTV.

Page 11: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195 193

Secondly, we wanted to validate the approach of usingtwo component models, a probability of repossessionmodel and a haircut model, which consists of thehaircut model itself and the haircut standard deviationmodel, to create a model that produces estimates forLGD. Here, two methods are explained, both of whichwill produce a value of the predicted LGD for everydefault observation, because the haircut model, whichgives a predicted sale amount and predicted shortfall,will be applied to all observations regardless of theirprobabilities of repossession. However, we then showhow the first method, which uses only the haircutpoint estimate, could end up underestimating the LGDpredictions. The second and preferred method (expectedshortfall) derives the expected loss from an estimatednormal haircut distribution, having the predicted haircutfrom the haircutmodel as themean, andwith the standarddeviation obtained from the haircut standard deviationmodel.

For comparison purposes, we also developed a single-stage model. This model produced a lower R2 value, andwas also unable to fully emulate the actual distribution ofLGD.

Having shown that the proposed two-stage modellingapproach works well on real-life data, in our further re-

search we intend to explore the inclusion of macroeco-nomic variables in either one or both of the probabil-ity of repossession model and the haircut model. Thesemacroeconomic variables might include the unemploy-ment rate, the inflation rate, the interest rate, or someindication of the amount of borrowing in each eco-nomic year. Finally, we also consider the use of alter-native methods, for example survival analysis, to bet-ter predict and estimate the time periods between eachmilestone (repossession and sale) of a defaulted loanaccount.

Acknowledgments

We thank the bank which kindly provided the datasetthat enabled this work to be carried out, and ProfessorLyn Thomas for his guidance throughout this work. Wealso thank the editor and reviewers who have madeinvaluable contributionswith their insightful feedback andrecommendations. Any mistakes are solely ours.

Appendix

See Tables A.1–A.7.

Table A.1Parameter estimates for the probability of repossession model R0.

Variable Variable explanation Estimate StdErr WaldChiSq ProbChiSq

Intercept – −3.069 0.028 12235.289 <0.01DLTV Loan to value at default 2.821 0.029 9449.349 <0.01

Table A.2Parameter estimates for the probability of repossession model R1.

Variable Variable explanation Estimate StdErr WaldChiSq ProbChiSq

Intercept – −1.138 0.040 795.605 <0.01LTV Loan to value at loan application 2.101 0.040 2809.703 <0.01TOB Time on books (in years) −0.188 0.003 2899.616 <0.01Previous default Indicator for previous default 0.102 0.034 8.869 <0.01Security0 (base) Flat or other – – – –Security1 Detached −0.625 0.031 413.989 <0.01Security2 Semi-detached −0.670 0.024 787.436 <0.01Security3 Terraced −0.421 0.021 395.497 <0.01

Table A.3Parameter estimates for the probability of repossession model R2.

Variable Variable explanation Estimate StdErr WaldChiSq ProbChiSq

Intercept – −2.570 0.034 5769.803 <0.01DLTV Loan to value at default 2.679 0.029 8295.648 <0.01Previous default Indicator for previous default −0.471 0.032 211.064 <0.01Security0 (base) Flat or other – – – –Security1 Detached −0.461 0.031 219.425 <0.01Security2 Semi-detached −0.546 0.024 503.458 <0.01Security3 Terraced −0.343 0.022 253.470 <0.01

Page 12: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

194 M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195

Table A.4Parameter estimates for haircut model H1.

Variable Variable explanation Estimate StdErr ProbT VIF

Intercept – 0.508 0.009 <0.01 0.000LTV Loan to value at loan application 0.243 0.007 <0.01 1.136TOB Time on books (in years) 0.005 0.001 <0.01 1.251VVAratio1 (base) Value of property/region average ≤ 0.9 – – – –VVAratio2 0.9 < Value of property/region average ≤ 1.2 −0.005 0.004 0.248 1.134VVAratio3 1.2 < Value of property/region average ≤ 1.5 −0.059 0.006 <0.01 1.149VVAratio4 1.5 < Value of property/region average ≤ 1.8 −0.092 0.008 <0.01 1.127VVAratio5 1.8 < Value of property/region average ≤ 2.4 −0.090 0.009 <0.01 1.161VVAratio6 Value of property/region average > 2.4 −0.138 0.009 <0.01 1.226Previous default Indicator for previous default 0.042 0.006 <0.01 1.168Propage1 Very old property (before 1919) −0.085 0.003 <0.01 1.273Propage2 Old property (1919–1945) −0.032 0.004 <0.01 1.194Propage3 (base) Built after 1945 – – – –Security0 (base) Flat or other – – – –Security1 Detached 0.165 0.006 <0.01 1.875Security2 Semi-detached 0.129 0.004 <0.01 1.764Security3 Terraced 0.094 0.003 <0.01 1.739Region1 North −0.112 0.010 <0.01 1.753Region2 Yorkshire & Humberside −0.095 0.008 <0.01 2.898Region3 North West −0.099 0.008 <0.01 3.163Region4 East Midlands −0.100 0.008 <0.01 2.489Region5 West Midlands −0.065 0.008 <0.01 2.449Region6 East Anglia −0.067 0.009 <0.01 1.968Region7 Wales −0.115 0.009 <0.01 2.140Region8 South West −0.047 0.008 <0.01 3.272Region9 South East −0.062 0.007 <0.01 6.348Region10 Greater London −0.010 0.007 0.166 5.214Region11 Northern Ireland −0.034 0.014 0.017 1.256Region12 (base) Scotland or others/missing – – – –

Table A.5Parameter estimates for the haircut model H2.

Variable Variable explanation Estimate StdErr ProbT VIF

Intercept – 0.591 0.008 <0.01 0.000DLTV Loan to value at default 0.162 0.005 <0.01 1.175VVAratio1 (base) Value of property/region average ≤ 0.9 – – – –VVAratio2 0.9 < Value of property/region average ≤ 1.2 −0.011 0.004 <0.01 1.126VVAratio3 1.2 < Value of property/region average ≤ 1.5 −0.069 0.006 <0.01 1.141VVAratio4 1.5 < Value of property/region average ≤ 1.8 −0.108 0.008 <0.01 1.116VVAratio5 1.8 < Value of property/region average ≤ 2.4 −0.108 0.009 <0.01 1.149VVAratio6 Value of property/region average > 2.4 −0.158 0.009 <0.01 1.209Previous default Indicator for previous default 0.064 0.005 <0.01 1.010Propage1 Very old property (before 1919) −0.079 0.003 <0.01 1.261Propage2 Old property (1919–1945) −0.030 0.004 <0.01 1.193Propage3 (base) Built after 1945 – – – –Security0 (base) Flat or other – – – –Security1 Detached 0.162 0.006 <0.01 1.874Security2 Semi-detached 0.126 0.004 <0.01 1.761Security3 Terraced 0.092 0.003 <0.01 1.736Region1 North −0.109 0.010 <0.01 1.752Region2 Yorkshire & Humberside −0.094 0.008 <0.01 2.897Region3 North West −0.098 0.008 <0.01 3.159Region4 East Midlands −0.112 0.008 <0.01 2.497Region5 West Midlands −0.076 0.008 <0.01 2.454Region6 East Anglia −0.102 0.009 <0.01 2.007Region7 Wales −0.125 0.009 <0.01 2.141Region8 South West −0.080 0.008 <0.01 3.325Region9 South East −0.095 0.007 <0.01 6.489Region10 Greater London −0.042 0.007 <0.01 5.323Region11 Northern Ireland −0.030 0.014 0.040 1.256Region12 (base) Scotland or others/missing – – – –

Table A.6Parameter estimates for the haircut standard deviation model.

Variable Variable explanation Estimate StdErr ProbT

Intercept – 0.181 <0.001 <0.01TOB bins Time on books (in years) 0.010 <0.001 <0.01

Page 13: Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data

M. Leow, C. Mues / International Journal of Forecasting 28 (2012) 183–195 195

Table A.7Parameter estimates for the single-stage LGD model.

Variable Variable explanation Estimate StdErr ProbT VIF

Intercept – −0.093 0.005 <0.01 0.000DLTV Loan to value at default 0.230 0.002 <0.01 1.263Secondapp Second applicant present −0.003 0.001 0.012 1.105VVAratio1 Value of property/region average ≤ 0.9 −0.049 0.004 <0.01 8.976VVAratio2 0.9 < Value of property/region average ≤ 1.2 −0.050 0.004 <0.01 5.416VVAratio3 1.2 < Value of property/region average ≤ 1.5 −0.035 0.004 <0.01 3.093VVAratio4 1.5 < Value of property/region average ≤ 1.8 −0.018 0.005 <0.01 2.148VVAratio5 1.8 < Value of property/region average ≤ 2.4 −0.018 0.005 <0.01 2.037VVAratio6 (base) Value of property/region average > 2.4 – – – –Previous default Indicator for previous default −0.032 0.002 <0.01 1.018Propage1 Built before 1919 0.023 0.002 <0.01 1.653Propage2 (base) Built 1919–1945 – – – –Propage3 Built after 1945 −0.010 0.001 <0.01 1.536Propage4 Age unknown −0.133 0.014 <0.01 1.017Security0 Flat or other 0.065 0.002 <0.01 1.370Security1 Detached −0.020 0.002 <0.01 1.628Security2 Semi-detached −0.013 0.002 <0.01 1.370Security3 (base) Terraced – – – –Region0 Others or missing 0.054 0.013 <0.01 1.054Region1 North 0.041 0.004 <0.01 1.758Region2 Yorkshire & Humberside 0.041 0.003 <0.01 3.011Region3 North West 0.047 0.003 <0.01 2.940Region4 East Midlands 0.052 0.004 <0.01 2.233Region5 West Midlands 0.037 0.004 <0.01 2.364Region6 East Anglia 0.047 0.004 <0.01 1.782Region7 Wales 0.047 0.004 <0.01 2.047Region8 South West 0.038 0.003 <0.01 2.936Region9 South East 0.050 0.003 <0.01 5.244Region10 Greater London 0.030 0.003 <0.01 4.265Region11 Northern Ireland 0.028 0.006 <0.01 1.333Region12 (base) Scotland – – – –

References

Altman, E. I., Brady, B., Resti, A., & Sironi, A. (2005). The linkbetween default and recovery rates: theory, empirical evidence, andimplications. Journal of Business, 78, 2203–2228.

Calem, P. S., & LaCour-Little, M. (2004). Risk-based capital requirementsfor mortgage loans. Journal of Banking and Finance, 28, 647–672.

Campbell, T. S., & Dietrich, J. K. (1983). The determinants of default oninsured conventional residential mortgage loans. Journal of Finance,38, 1569–1581.

DeLong, E. R., DeLong, D.M., & Clarke-Pearson, D. L. (1988). Comparing theareas under two or more correlated receiver operating characteristiccurves: a nonparametric approach. Biometrics, 44, 837–845.

Federal Register (2007). Risk-based capital standards: advanced capitaladequacy framework – Basel II. 72 Fed. Reg. 69,288.

Financial Services Authority (2009). Prudential sourcebook for banks,building societies and investment firms.

Fernandez, G. C. J. (2007). Effects of multicollinearity in all possiblemixed model selection. In PharamaSUg conference, statistics andpharmacokinetics. Colorado: Denver.

Gupton, G.M., & Stein, R.M. (2002). LossCalc:Model for predicting loss givendefault (LGD). Mimeo. Moody’s Investors Service.

Jarrow, R. (2001). Default parameter estimation using market prices.Financial Analysts Journal, 57, 75–92.

Jokivuolle, E., & Peura, S. (2003). Incorporating collateral value uncertaintyin loss given default estimates and loan-to-value ratios. EuropeanFinancial Management , 9, 299–314.

Lucas, A. (2006). Basel II problem solving. Conference on Basel II and creditrisk modelling in consumer lending, Southampton, UK.

Peng, G. (2004). Testing normality of data using SAS. San Diego, California:PharmaSUG.

Qi, M., & Yang, X. (2009). Loss given default of high loan-to-valueresidential mortgages. Journal of Banking and Finance, 33, 788–799.

Quercia, R. G., & Stegman, M. A. (1992). Residential mortgage default: areview of the literature. Journal of Housing Research, 3, 341–379.

Schuermann, T. (2004). What do we know about loss given default?In D. Shimko (Ed.), Credit risk: models and management (2nd ed.). RiskBooks.

Somers, M., & Whittaker, J. (2007). Quantile regression for modellingdistributions of profit and loss. European Journal of OperationalResearch, 183, 1477–1487.

Truck, S., Harpaintner, S., & Rachev, S. T. (2005). A note on forecastingaggregate recovery rates with macroeconomic variables. Institutfur statistik und mathematische wirtschaftstheorie, UniversitatKarlsruhe, Karlsruhe, Germany.

von Furstenberg, G. M. (1969). Default risk on FHA-insured homemortgages as a function of the terms of financing: a quantitativeanalysis. Journal of Finance, 24, 459–477.

Mindy Leow is a Post-Doctoral Research Fellow at the Credit ResearchCentre in the University of Edinburgh Business School. Her educationalbackground includes a B.Sc. in Mathematics with Economics and anM.Sc.in Operational Research, and she recently obtained her doctorate in thearea of credit risk from the University of Southampton. Her researchinterests focus mainly on the Probability of Default (PD) and Loss GivenDefault (LGD) in retail loans.

Christophe Mues is a lecturer (assistant professor) at the School ofManagement of the University of Southampton (UK). Prior to hisappointment at the University of Southampton, he was employed asa researcher at K.U. Leuven (Belgium), where he obtained the degreeof Doctor of Applied Economics in November 2002. His main researchinterests are in the business intelligence and data mining domains. Inrecent years, he has developed a particular interest in applying datamining techniques to credit risk modelling in the context of Basel II andcredit scoring. His findings have been published in various internationaljournals and conference proceedings.