Capstone Project - Credit Risk Analysis
Capstone Project - Credit Risk Analysis
Capstone Project - Credit Risk Analysis
Problem statement :
• Identifying the right Customers using Predictive Modelling.
• Utilizing past bank data to identify Risky Customers and avoid Credit Risk.
• Create Strategies to mitigate the Acquisition Risks
• Assess the Final Financial benefits of the model.
Business Understanding :
• Analyzing and Understanding the Demographic and Credit Bureau Data Set.
• Performance Tag is an indicator of customer gone 90Days part due.
• The missing/NA values and identified and imputed using WOE/Information Value Analysis.
• Building a Predictive Model using both Demographic in order to understand the predictive power.
34% • Evaluating the model and validating the likelihood of default for rejected candidates.
• Building an Application Scorecard with Good to Bad odds of 10 to 1.
• Accessing the financial benefits to the organization basis the pr
Business Understanding
Variables Type
Application ID <int>
Age <int>
Gender <chr>
Demographic Dataset: Marital Status (at the time of application) <chr>
• Consists of Categorical data related to customer like Age, Marital No of dependents <chr>
Income <dbl>
Status, Gender, No of Dependents, Income, education, Profession, No Education <chr>
of Months in Current Residence and Current Job, Performance Tag Profession
Type of residence
<chr>
<chr>
• Performance Tag represents if the customer has gone beyond 90 No of months in current residence <int>
No of months in current company <int>
days. Performance Tag <int>
Capping the
Age and Imputing
Removing Binning the Income missing
Merging the variables
Duplicates Categorical Variable
Identifying the Demographic and using
and variables such basis the
missing/NA Credit Bureau Data Information
Cleaning the as Age to Analysis.
Variables and basis Application ID Value and
Dataset. replacing them analyse the
Getting the basic same in details identifying
with Outlier Important
Business and Data
treatment . Variables
Understanding
Data Cleaning
Duplicate ID Identification and Cleansing:
• Both Demographic and Credit Bureau Data set had Duplicate Application Id. (ID :
653287861,671989187,765011468) and has been removed form bot the Data Sets.
Outlier Treatment:
• Identifying outlier values through various techniques
Plotting Boxplots
Calculating data percentiles
Calculating cook’s distance
An Example show ing an approach to visualize missing values
Checking the
Accuracy, Creating a
Running a WOE/IV for all Sensitivity and Score Card,
Logistic Predictor Building a Random Creating Bins
Specificity on
Regression Cut-off Variables then Forest Model to and
Normal,
Model, Using P identification building a understand the understanding
Imputed and
Value and VIF and Creating a model on Accuracy. the Cut-off
WOE Data
for checking Lift and Gain Imputed and Score.
Outlier Removal
multi Chart. WOE Data
Using Cooks
Distance collinearity
Outlier Identification- Cooks Distance
• So, let’s assume loss with each default is 15 while profit from
each non-default is 1
• Loss due to 476 additional default = 476*15= 7140
• Opportunity lost in profit from additionally rejecting = 5228
• Net Business Benefit from this model = 7140-5228 = 1912 units