2
#analyticsx Juhi Bhargava Prashanth Reddy Musuku MS in Business Analytics, Oklahoma State University ABSTRACT The lending business is crucial to the profitability of a bank or financial institution. Loan defaults, delay in repayment by customers, leads to a problems in cash flow position. The last economic crisis in US was triggered by loan defaults. This study aims to identify the factors contributing towards loan defaults, delay in repayments as well as the characteristics of a borrower who will honor all the obligations of a loan. The results will enable us to determine the relationship between loan and customer characteristics and the probability to default. The results may also be used to appraise and monitor credit risk at the time of loan approval and during the tenure of the loan. The loan data for December 2015 was extracted from the Lending Club website, an online credit market place. It consists of all loans issued through December, 2015 along with the loan status. It contains 111 variables such as the details of customer’s loan account, amount, application, and purpose. There are around 420,000 observations. The factors contributing towards loan default are identified and predicted using regression, decision tree and artificial neural networks models. METHODS Data Exploration and Preparation The data is extracted from the Lending Club website, an online credit market place The original data set has 111 variables – 84 interval variables and 27 nominal variables The final data set has 71 variables – 58 interval variables and 13 nominal variables Loan_status is the target variable with 7 levels Variable reduction was done due to Similar variables like id, member_id Too many levels – zip_code, emp_title, purpose Variables with missing values more than 50% Log transformation was applied to reduce the skewness Imputation was done by Tree method for class variables and for interval variables Mean method was used Funded_amt and funded_amt_inv show similar distribution The bar graph shows that most of the loans are current DESCRIPTIVE ANALYSIS Identifying the factors responsible for loan defaults and classification of customers using SAS® Enterprise Miner

Juhi Bhargava Prashanth Reddy Musuku · #analyticsx Juhi Bhargava Prashanth Reddy Musuku MS in Business Analytics, Oklahoma State University ABSTRACT The lending business is crucial

  • Upload
    ngocong

  • View
    230

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Juhi Bhargava Prashanth Reddy Musuku · #analyticsx Juhi Bhargava Prashanth Reddy Musuku MS in Business Analytics, Oklahoma State University ABSTRACT The lending business is crucial

#analyticsx

Juhi Bhargava Prashanth Reddy Musuku

MS in Business Analytics, Oklahoma State University

ABSTRACT

The lending business is crucial to the profitability of a bank or financial institution. Loan defaults, delay in repayment bycustomers, leads to a problems in cash flow position. The last economic crisis in US was triggered by loan defaults. Thisstudy aims to identify the factors contributing towards loan defaults, delay in repayments as well as the characteristicsof a borrower who will honor all the obligations of a loan. The results will enable us to determine the relationshipbetween loan and customer characteristics and the probability to default. The results may also be used to appraise andmonitor credit risk at the time of loan approval and during the tenure of the loan. The loan data for December 2015was extracted from the Lending Club website, an online credit market place. It consists of all loans issued throughDecember, 2015 along with the loan status. It contains 111 variables such as the details of customer’s loan account,amount, application, and purpose. There are around 420,000 observations. The factors contributing towards loandefault are identified and predicted using regression, decision tree and artificial neural networks models.

METHODS

Data Exploration and Preparation

• The data is extracted from the Lending Club website, an online credit market place

• The original data set has 111 variables – 84 interval variables and 27 nominal variables

• The final data set has 71 variables – 58 interval variables and 13 nominal variables

• Loan_status is the target variable with 7 levels

• Variable reduction was done due to

• Similar variables like id, member_id

• Too many levels – zip_code, emp_title, purpose

• Variables with missing values more than 50%

• Log transformation was applied to reduce the skewness

• Imputation was done by Tree method for class variables and for interval variables Mean method was used

• Funded_amt and funded_amt_inv show similar distribution

• The bar graph shows that most of the loans are current

DESCRIPTIVE ANALYSIS

Identifying the factors responsible for loan defaults and classification of customers using SAS® Enterprise Miner

Page 2: Juhi Bhargava Prashanth Reddy Musuku · #analyticsx Juhi Bhargava Prashanth Reddy Musuku MS in Business Analytics, Oklahoma State University ABSTRACT The lending business is crucial

#analyticsx

DECISION TREE MODEL COMPARISION AND CONCLUSION

NEURAL NETWORK

• Neural Net comprises of three hidden layers

• Average Squared Error is 0.00506 for validation data

• Misclassification rate is 0.022888 for validation data

• Optimal average squared error and misclassification rate occurs at iteration 85

REFERENCES AND ACKNOWLEDGEMENT

• Kattamuri S. Sarma (2013),Predictive Modeling with SAS® Enterprise Miner™: Practical Solutions for Business Applications, Second Edition

• Kechen Zhao (2015),Predictive Modeling Using Artificial Neural Networks in SAS® Enterprise Miner

• Sascha Schubert (2010), Fraud Detection with SAS® Data Mining

• I thank Dr. Goutam Chakraborty, Professor, Department of Marketing andfounder of SAS and OSU Data Mining Certificate program- Oklahoma StateUniversity for his support and guidance

RULE INDUCTION

• Rule Induction Prediction Model combines decision tree and regressionmodels to predict nominal targets

• Averaged Squared Error for validation data is 0.004781

• Misclassification rate for validation data is 0.019946

• Cumulative lift separated the training and validation data at 40% depth with ahigh level of lift

Identifying the factors responsible for loan defaults and classification of customers using SAS® Enterprise Miner

Juhi Bhargava Prashanth Reddy Musuku

MS in Business Analytics, Oklahoma State University

• Variable importance gives the important variables-out_prncp, last_pymnt_d,total_rec_prncp, funded_amt,last_pymt_amt

• There are 17 leaf nodes in the tree map

• For a loan to turn into default, the rule is out_prncp >= 0.195 Or Missing AND

last_pymnt_d 16-MAR, 16-APR, 16-FEB, 16-JAN Or Missing AND

last_pymnt_d 16-JAN Or Missing

• Decision Tree turns out to be the best among other models based on AverageSquared Error and Misclassification Rate

• The main factors responsible for loan status are remaining outstandingprincipal for total amount funded and principal received to date, last monthpayment and amount, and funded loan amount