0% found this document useful (0 votes)

693 views35 pages

Credit Risk Modeling in Python Chapter3

The document discusses class imbalance in loan data, where the number of non-default loans greatly outnumbers default loans. This can negatively impact models, as incorrectly predicting a default as non-default (a false negative) is much more costly than the reverse. Gradient boosted trees aim to minimize log loss, but a model may learn to simply predict all loans as non-default due to the class imbalance.

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

693 views35 pages

Credit Risk Modeling in Python Chapter3

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Gradient boosted

trees with XGBoost

CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Decision trees
Creates predictions similar to logistic regression

Not structured like a regression

CREDIT RISK MODELING IN PYTHON

Decision trees for loan status
Simple decision tree for predicting loan_status probability of default

CREDIT RISK MODELING IN PYTHON

Decision tree impact

Loan True loan status Pred. Loan Status Loan payoff value Selling Value Gain/Loss

1 0 1 $1,500 $250 -$1,250

2 0 1 $1,200 $250 -$950

CREDIT RISK MODELING IN PYTHON

A forest of trees
XGBoost uses many simplistic trees (ensemble)

Each tree will be slightly better than a coin toss

CREDIT RISK MODELING IN PYTHON

Creating and training trees
Part of the xgboost Python package, called xgb here

Trains with .fit() just like the logistic regression model

# Create a logistic regression model

clf_logistic = LogisticRegression()
# Train the logistic regression
clf_logistic.fit(X_train, np.ravel(y_train))

# Create a gradient boosted tree model

clf_gbt = xgb.XGBClassifier()
# Train the gradient boosted tree
clf_gbt.fit(X_train,np.ravel(y_train))

CREDIT RISK MODELING IN PYTHON

Default predictions with XGBoost
Predicts with both .predict() and .predict_proba()
.predict_proba() produces a value between 0 and 1

.predict() produces a 1 or 0 for loan_status

# Predict probabilities of default

gbt_preds_prob = clf_gbt.predict_proba(X_test)
# Predict loan_status as a 1 or 0
gbt_preds = clf_gbt.predict(X_test)

# gbt_preds_prob
array([[0.059, 0.940], [0.121, 0.989]])
# gbt_preds
array([1, 1, 0...])

CREDIT RISK MODELING IN PYTHON

Hyperparameters of gradient boosted trees
Hyperparameters: model parameters (settings) that cannot be learned from data

Some common hyperparameters for gradient boosted trees

learning_rate : smaller values make each step more conservative

max_depth : sets how deep each tree can go, larger means more complex

xgb.XGBClassifier(learning_rate = 0.2,
max_depth = 4)

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Column selection for
credit risk
CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Choosing speci c columns
We've been using all columns for predictions

# Selects a few specific columns

X_multi = cr_loan_prep[['loan_int_rate','person_emp_length']]

# Selects all data except loan_status

X = cr_loan_prep.drop('loan_status', axis = 1)

How you can tell how important each column is

Logistic Regression: column coef cients

Gradient Boosted Trees: ?

CREDIT RISK MODELING IN PYTHON

Column importances
Use the .get_booster() and .get_score() methods
Weight: the number of times the column appears in all trees

# Train the model

clf_gbt.fit(X_train,np.ravel(y_train))
# Print the feature importances
clf_gbt.get_booster().get_score(importance_type = 'weight')

{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}

CREDIT RISK MODELING IN PYTHON

Column importance interpretation
# Column importances from importance_type = 'weight'
{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}

CREDIT RISK MODELING IN PYTHON

Plotting column importances
Use the plot_importance() function

xgb.plot_importance(clf_gbt, importance_type = 'weight')

{'person_income': 315, 'loan_int_rate': 195, 'loan_percent_income': 146}

CREDIT RISK MODELING IN PYTHON

Choosing training columns
Column importance is used to sometimes decide which columns to use for training

Different sets affect the performance of the models

Model Model Default

Columns Importances
Accuracy Recall

loan_int_rate, person_emp_length (100, 100) 0.81 0.67

loan_int_rate, person_emp_length,
(98, 70, 5) 0.84 0.52
loan_percent_income

CREDIT RISK MODELING IN PYTHON

F1 scoring for models
Thinking about accuracy and recall for different column groups is time consuming

F1 score is a single metric used to look at both accuracy and recall

Shows up as a part of the classification_report()

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Cross validation for
credit models
CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Cross validation basics
Used to train and test the model in a way that simulates using the model on new data

Segments training data into different pieces to estimate future performance

Uses DMatrix , an internal structure optimized for XGBoost

Early stopping tells cross validation to stop after a scoring metric has not improved after a number of
iterations

CREDIT RISK MODELING IN PYTHON

How cross validation works
Processes parts of training data as (called folds) and tests against unused part

Final testing against the actual test set

1 https://scikit 2 learn.org/stable/modules/cross_validation.html

CREDIT RISK MODELING IN PYTHON

Setting up cross validation within XGBoost
# Set the number of folds
n_folds = 2
# Set early stopping number
early_stop = 5
# Set any specific parameters for cross validation
params = {'objective': 'binary:logistic',
'seed': 99, 'eval_metric':'auc'}

'binary':'logistic' is used to specify classi cation for loan_status

'eval_metric':'auc' tells XGBoost to score the model's performance on AUC

CREDIT RISK MODELING IN PYTHON

Using cross validation within XGBoost
# Restructure the train data for xgboost
DTrain = xgb.DMatrix(X_train, label = y_train)
# Perform cross validation
xgb.cv(params, DTrain, num_boost_round = 5, nfold=n_folds,
early_stopping_rounds=early_stop)

DMatrix() creates a special object for xgboost optimized for training

CREDIT RISK MODELING IN PYTHON

The results of cross validation
Creates a data frame of the values from the cross validation

CREDIT RISK MODELING IN PYTHON

Cross validation scoring
Uses cross validation and scoring metrics with cross_val_score() function in scikit-learn

# Import the module

from sklearn.model_selection import cross_val_score
# Create a gbt model
xg = xgb.XGBClassifier(learning_rate = 0.4, max_depth = 10)
# Use cross valudation and accuracy scores 5 consecutive times
cross_val_score(gbt, X_train, y_train, cv = 5)

array([0.92748092, 0.92575308, 0.93975392, 0.93378608, 0.93336163])

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON
Class imbalance in
loan data
CREDIT RIS K MODELIN G IN P YTH ON

Michael Crabtree
Data Scientist, Ford Motor Company
Not enough defaults in the data
The values of loan_status are the classes
Non-default: 0

Default: 1

y_train['loan_status'].value_counts()

loan_status Training Data Count Percentage of Total

0 13,798 78%

1 3,877 22%

CREDIT RISK MODELING IN PYTHON

Model loss function
Gradient Boosted Trees in xgboost use a loss function of log-loss
The goal is to minimize this value

True loan status Predicted probability Log Loss

1 0.1 2.3

0 0.9 2.3
An inaccurately predicted default has more negative nancial impact

CREDIT RISK MODELING IN PYTHON

The cost of imbalance
A false negative (default predicted as non-default) is much more costly

Person Loan Amount Potential Pro t Predicted Status Actual Status Losses

A $1,000 $10 Default Non-Default -$10

B $1,000 $10 Non-Default Default -$1,000

Log-loss for the model is the same for both, our actual losses is not

CREDIT RISK MODELING IN PYTHON

Causes of imbalance
Data problems
Credit data was not sampled correctly

Data storage problems

Business processes:
Measures already in place to not accept probable defaults

Probable defaults are quickly sold to other rms

Behavioral factors:
Normally, people do not default on their loans
The less often they default, the higher their credit rating

CREDIT RISK MODELING IN PYTHON

Dealing with class imbalance
Several ways to deal with class imbalance in data

Method Pros Cons

Gather more data Increases number of defaults Percentage of defaults may not change

Penalize models Increases recall for defaults Model requires more tuning and maintenance

Sample data differently Least technical adjustment Fewer defaults in data

CREDIT RISK MODELING IN PYTHON

Undersampling strategy
Combine smaller random sample of non-defaults with defaults

CREDIT RISK MODELING IN PYTHON

Combining the split data sets
Test and training set must be put back together

Create two new sets based on actual loan_status

# Concat the training sets

X_y_train = pd.concat([X_train.reset_index(drop = True),
y_train.reset_index(drop = True)], axis = 1)
# Get the counts of defaults and non-defaults
count_nondefault, count_default = X_y_train['loan_status'].value_counts()
# Separate nondefaults and defaults
nondefaults = X_y_train[X_y_train['loan_status'] == 0]
defaults = X_y_train[X_y_train['loan_status'] == 1]

CREDIT RISK MODELING IN PYTHON

Undersampling the non-defaults
Randomly sample data set of non-defaults

Concatenate with data set of defaults

# Undersample the non-defaults using sample() in pandas

nondefaults_under = nondefaults.sample(count_default)
# Concat the undersampled non-defaults with the defaults
X_y_train_under = pd.concat([nondefaults_under.reset_index(drop = True),
defaults.reset_index(drop = True)], axis=0)

CREDIT RISK MODELING IN PYTHON

Let's practice!
CREDIT RIS K MODELIN G IN P YTH ON

FRM Part 1 Formula Sheet-1651745537143
No ratings yet
FRM Part 1 Formula Sheet-1651745537143
31 pages
New NCCC Template FCB Omooria 29jun2021
No ratings yet
New NCCC Template FCB Omooria 29jun2021
4 pages
Logit Model For PD
No ratings yet
Logit Model For PD
9 pages
SR 11-7, Validation and Machine Learning Models
100% (1)
SR 11-7, Validation and Machine Learning Models
31 pages
Book Review: IFRS 9 and CECL Credit Risk Modelling and Validation - A Practical Guide With Examples in R and SAS
No ratings yet
Book Review: IFRS 9 and CECL Credit Risk Modelling and Validation - A Practical Guide With Examples in R and SAS
2 pages
Credit Risk Modeling in Python Chapter3
No ratings yet
Credit Risk Modeling in Python Chapter3
35 pages
Credit Risk Modelling - A Primer
No ratings yet
Credit Risk Modelling - A Primer
42 pages
Market Risk Questions PDF
No ratings yet
Market Risk Questions PDF
16 pages
Basics of Credit Risk Modelling
100% (1)
Basics of Credit Risk Modelling
13 pages
Model Risk Management with SAS
From Everand
Model Risk Management with SAS
SAS Institute Inc.
No ratings yet
Introduction To Data Visualization With Seaborn Chapter3
100% (1)
Introduction To Data Visualization With Seaborn Chapter3
32 pages
Designing Machine Learning Workflows in Python Chapter2
No ratings yet
Designing Machine Learning Workflows in Python Chapter2
39 pages
Prepositions of Place Map Practice Grammar Drills Information Gap Activities - 103340
No ratings yet
Prepositions of Place Map Practice Grammar Drills Information Gap Activities - 103340
2 pages
Credit Risk Modeling in Python Chapter1
100% (1)
Credit Risk Modeling in Python Chapter1
27 pages
Credit Risk Modeling in Python Chapter4
100% (1)
Credit Risk Modeling in Python Chapter4
35 pages
Credit Risk Modeling in R
100% (2)
Credit Risk Modeling in R
66 pages
Credit Risk Modeling in Python Chapter2
100% (1)
Credit Risk Modeling in Python Chapter2
36 pages
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
No ratings yet
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
6 pages
Credit Risk Modeling
No ratings yet
Credit Risk Modeling
213 pages
106 - Machine Learning and Credit Risk Modelling
100% (1)
106 - Machine Learning and Credit Risk Modelling
8 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Credit Risk Estimation Techniques
0% (1)
Credit Risk Estimation Techniques
31 pages
Credit Risk Modelling
No ratings yet
Credit Risk Modelling
28 pages
Credit Risk Modelling and Quantification
No ratings yet
Credit Risk Modelling and Quantification
144 pages
An Introduction To Credit Risk in Banking - BASEL, IFRS9, Pricing, Statistics, Machine Learning - PART 1 - by Willem Pretorius - Mar, 2023 - Medium
No ratings yet
An Introduction To Credit Risk in Banking - BASEL, IFRS9, Pricing, Statistics, Machine Learning - PART 1 - by Willem Pretorius - Mar, 2023 - Medium
37 pages
FASB's Current Expected Credit Loss Model For Credit Loss Accounting (CECL) : Background and FAQ 'S For Bankers June 2016
No ratings yet
FASB's Current Expected Credit Loss Model For Credit Loss Accounting (CECL) : Background and FAQ 'S For Bankers June 2016
23 pages
Merton PD Model
No ratings yet
Merton PD Model
6 pages
Credit Risk Modeling
No ratings yet
Credit Risk Modeling
4 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Lecture 1.1 CQF 2010 - B
No ratings yet
Lecture 1.1 CQF 2010 - B
52 pages
Estimation of Probability of Defaults (PD) For Low Default Portfolios An Actuarial Approach
100% (2)
Estimation of Probability of Defaults (PD) For Low Default Portfolios An Actuarial Approach
47 pages
Models For PD LGD Ead
100% (2)
Models For PD LGD Ead
38 pages
An Introductory Guide in The Construction of Actuarial Models: A Preparation For The Actuarial Exam C/4
100% (1)
An Introductory Guide in The Construction of Actuarial Models: A Preparation For The Actuarial Exam C/4
350 pages
Point-In-Time (PIT) LGD and EAD Models For IFRS9/CECL and Stress Testing
No ratings yet
Point-In-Time (PIT) LGD and EAD Models For IFRS9/CECL and Stress Testing
16 pages
Modelling Credit Risk
No ratings yet
Modelling Credit Risk
27 pages
Credit Risk Models
No ratings yet
Credit Risk Models
32 pages
Random Forest
No ratings yet
Random Forest
32 pages
Validators Guide To Model Risk Management by RiskSpan
100% (5)
Validators Guide To Model Risk Management by RiskSpan
29 pages
Forecasting Default With The KMV-Merton Model
No ratings yet
Forecasting Default With The KMV-Merton Model
35 pages
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
No ratings yet
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
12 pages
The ORSA Process: Where Do I Start This Circle?: A Practical ORSA Application
100% (1)
The ORSA Process: Where Do I Start This Circle?: A Practical ORSA Application
36 pages
Loan Pricing
No ratings yet
Loan Pricing
39 pages
Risk Models
100% (1)
Risk Models
20 pages
Modeling of EAD and LGD: Empirical Approaches and Technical Implementation
100% (1)
Modeling of EAD and LGD: Empirical Approaches and Technical Implementation
21 pages
PRM Handbook Introduction and Contents
0% (2)
PRM Handbook Introduction and Contents
23 pages
CQF January 2017 M5L6 Blank PDF
100% (3)
CQF January 2017 M5L6 Blank PDF
122 pages
SMEs Credit Risk Modelling For PDF
No ratings yet
SMEs Credit Risk Modelling For PDF
270 pages
CK1 Booklet 1 PDF
No ratings yet
CK1 Booklet 1 PDF
134 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Credit Card Score Prediction Using Machine Learning
No ratings yet
Credit Card Score Prediction Using Machine Learning
8 pages
Probability of Default
100% (1)
Probability of Default
5 pages
Financial Markets - Products-1
100% (2)
Financial Markets - Products-1
470 pages
CH 9 Risk Management and Investment Management 2F1VD3TVIL
No ratings yet
CH 9 Risk Management and Investment Management 2F1VD3TVIL
183 pages
CK1 Booklet 1 PDF
No ratings yet
CK1 Booklet 1 PDF
89 pages
Managing Credit Risk
No ratings yet
Managing Credit Risk
44 pages
Lecture 3 EdgeDetection
No ratings yet
Lecture 3 EdgeDetection
52 pages
Stress Testing and Risk Integration in Banks: University of Passau
No ratings yet
Stress Testing and Risk Integration in Banks: University of Passau
53 pages
The Basel II IRB Approach For Credit Portfolios
0% (1)
The Basel II IRB Approach For Credit Portfolios
30 pages
Credit Risk Predictive Modelling - by EY
0% (1)
Credit Risk Predictive Modelling - by EY
37 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Credit Risk Modeling Using Python
No ratings yet
Credit Risk Modeling Using Python
133 pages
Machine Learning - Challenges, Lessons, and Opportunities in Credit Risk Modeling - Long
No ratings yet
Machine Learning - Challenges, Lessons, and Opportunities in Credit Risk Modeling - Long
1 page
B2 19bec113 19bec116 Loan Prediction
No ratings yet
B2 19bec113 19bec116 Loan Prediction
3 pages
Loan Approval
No ratings yet
Loan Approval
12 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Spoken Language Processing in Python Chapter1
No ratings yet
Spoken Language Processing in Python Chapter1
17 pages
Spoken Language Processing in Python Chapter4
No ratings yet
Spoken Language Processing in Python Chapter4
46 pages
Spoken Language Processing in Python Chapter3
No ratings yet
Spoken Language Processing in Python Chapter3
26 pages
Introduction To Data Visualization With Seaborn Chapter1
No ratings yet
Introduction To Data Visualization With Seaborn Chapter1
26 pages
Introduction To Data Visualization With Matplotlib Chapter2
No ratings yet
Introduction To Data Visualization With Matplotlib Chapter2
27 pages
Spoken Language Processing in Python Chapter2
No ratings yet
Spoken Language Processing in Python Chapter2
23 pages
Introduction To Data Visualization With Matplotlib: Ariel Rokem
No ratings yet
Introduction To Data Visualization With Matplotlib: Ariel Rokem
30 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
36 pages
Preparing Your Gures To Share With Others: Ariel Rokem
No ratings yet
Preparing Your Gures To Share With Others: Ariel Rokem
35 pages
Designing Machine Learning Workflows in Python Chapter3
No ratings yet
Designing Machine Learning Workflows in Python Chapter3
42 pages
Designing Machine Learning Workflows in Python Chapter1
No ratings yet
Designing Machine Learning Workflows in Python Chapter1
32 pages
Designing Machine Learning Workflows in Python Chapter4
No ratings yet
Designing Machine Learning Workflows in Python Chapter4
38 pages
Introduction To Data Visualization With Seaborn Chapter2
No ratings yet
Introduction To Data Visualization With Seaborn Chapter2
38 pages
Changing Plot Style and Color: Erin Case
No ratings yet
Changing Plot Style and Color: Erin Case
54 pages
Cleaning Data With PySpark Chapter3
No ratings yet
Cleaning Data With PySpark Chapter3
25 pages
Customer Segmentation in Python Chapter4
No ratings yet
Customer Segmentation in Python Chapter4
37 pages
Customer Segmentation in Python Chapter3
No ratings yet
Customer Segmentation in Python Chapter3
25 pages
Cleaning Data With PySpark Chapter4
No ratings yet
Cleaning Data With PySpark Chapter4
23 pages
Cleaning Data With PySpark Chapter2
100% (1)
Cleaning Data With PySpark Chapter2
25 pages
Building Chatbots in Python Chapter4
No ratings yet
Building Chatbots in Python Chapter4
20 pages
Analyzing IoT Data in Python Chapter4
No ratings yet
Analyzing IoT Data in Python Chapter4
34 pages
Cleaning Data With PySpark Chapter1
0% (1)
Cleaning Data With PySpark Chapter1
20 pages
Analyzing IoT Data in Python Chapter2
No ratings yet
Analyzing IoT Data in Python Chapter2
35 pages
Analyzing IoT Data in Python Chapter3
No ratings yet
Analyzing IoT Data in Python Chapter3
30 pages
Analyzing IoT Data in Python Chapter1
100% (1)
Analyzing IoT Data in Python Chapter1
27 pages
Building Chatbots in Python Chapter2 PDF
No ratings yet
Building Chatbots in Python Chapter2 PDF
41 pages
NSTP 102
No ratings yet
NSTP 102
32 pages
PDF Tarea 1 Fifsica DD
No ratings yet
PDF Tarea 1 Fifsica DD
8 pages
Alkysafe
No ratings yet
Alkysafe
9 pages
HCLTB0678118
No ratings yet
HCLTB0678118
2 pages
Guide: Getting Started
No ratings yet
Guide: Getting Started
59 pages
The Effective Use of Social Media in Crime Detection and Prevention: The Promotion of Public Trust in The Uae Police-The Case of The Abu Dhabi Police
No ratings yet
The Effective Use of Social Media in Crime Detection and Prevention: The Promotion of Public Trust in The Uae Police-The Case of The Abu Dhabi Police
305 pages
Instructions: You Must Read The Assigned Chapter For Physical Layer To Be Able To Answer These
No ratings yet
Instructions: You Must Read The Assigned Chapter For Physical Layer To Be Able To Answer These
3 pages
1 - Transportation Requirements Aviation NiCd - 2019
No ratings yet
1 - Transportation Requirements Aviation NiCd - 2019
4 pages
Chapter 2. Digital Image Fundamentals
No ratings yet
Chapter 2. Digital Image Fundamentals
68 pages
Horizon Gdansk Summer School Invitation
No ratings yet
Horizon Gdansk Summer School Invitation
5 pages
Application of A CFD Tool For System-Level Thermal Simulation
No ratings yet
Application of A CFD Tool For System-Level Thermal Simulation
9 pages
Extrajudicial Settlement of Paz Pactanac
No ratings yet
Extrajudicial Settlement of Paz Pactanac
3 pages
Object Database Vs - Object-Relational Databases
No ratings yet
Object Database Vs - Object-Relational Databases
21 pages
Chapter 1 - Lodging
No ratings yet
Chapter 1 - Lodging
33 pages
Thread Manufacturing Process
No ratings yet
Thread Manufacturing Process
15 pages
JPA Entity Association Mappings
No ratings yet
JPA Entity Association Mappings
8 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Women's Ability To Defy Stereotypical Gender Roles in A Patriarchal Society - ENG LL
No ratings yet
Women's Ability To Defy Stereotypical Gender Roles in A Patriarchal Society - ENG LL
10 pages
Navadhanya Crop System
100% (2)
Navadhanya Crop System
8 pages
Python Control Statments PDF
No ratings yet
Python Control Statments PDF
19 pages
NC FoodCodeManual 2009 FINAL PDF
No ratings yet
NC FoodCodeManual 2009 FINAL PDF
232 pages
Mech - Automobile Repairing & Servicing Workshop
100% (1)
Mech - Automobile Repairing & Servicing Workshop
10 pages
Toyo Engineering & Construction Sdn. BHD.: Field Inspection Notice
No ratings yet
Toyo Engineering & Construction Sdn. BHD.: Field Inspection Notice
2 pages
CNC Lap 1 - Merged
No ratings yet
CNC Lap 1 - Merged
35 pages
NFPA 170 Correction
No ratings yet
NFPA 170 Correction
63 pages
FBMINT T1 EBITDA - EBIT - EBT - EAT Eng
No ratings yet
FBMINT T1 EBITDA - EBIT - EBT - EAT Eng
11 pages
Hyphenated Words List
No ratings yet
Hyphenated Words List
4 pages
Oil Gas market-EN PDF
No ratings yet
Oil Gas market-EN PDF
4 pages

Credit Risk Modeling in Python Chapter3

Uploaded by

Credit Risk Modeling in Python Chapter3

Uploaded by

Gradient boosted

trees with XGBoost

Not structured like a regression

CREDIT RISK MODELING IN PYTHON

CREDIT RISK MODELING IN PYTHON

1 0 1 $1,500 $250 -$1,250

2 0 1 $1,200 $250 -$950

CREDIT RISK MODELING IN PYTHON

Each tree will be slightly better than a coin toss

CREDIT RISK MODELING IN PYTHON

Trains with .fit() just like the logistic regression model

# Create a logistic regression model

# Create a gradient boosted tree model

CREDIT RISK MODELING IN PYTHON

.predict() produces a 1 or 0 for loan_status

# Predict probabilities of default

CREDIT RISK MODELING IN PYTHON

Some common hyperparameters for gradient boosted trees

CREDIT RISK MODELING IN PYTHON

# Selects a few specific columns

# Selects all data except loan_status

How you can tell how important each column is

Gradient Boosted Trees: ?

CREDIT RISK MODELING IN PYTHON

# Train the model

CREDIT RISK MODELING IN PYTHON

CREDIT RISK MODELING IN PYTHON

xgb.plot_importance(clf_gbt, importance_type = 'weight')

CREDIT RISK MODELING IN PYTHON

Different sets affect the performance of the models

Model Model Default

loan_int_rate, person_emp_length (100, 100) 0.81 0.67

CREDIT RISK MODELING IN PYTHON

F1 score is a single metric used to look at both accuracy and recall

Shows up as a part of the classification_report()

CREDIT RISK MODELING IN PYTHON

Segments training data into different pieces to estimate future performance

Uses DMatrix , an internal structure optimized for XGBoost

CREDIT RISK MODELING IN PYTHON

Final testing against the actual test set

CREDIT RISK MODELING IN PYTHON

'binary':'logistic' is used to specify classi cation for loan_status

'eval_metric':'auc' tells XGBoost to score the model's performance on AUC

CREDIT RISK MODELING IN PYTHON

DMatrix() creates a special object for xgboost optimized for training

CREDIT RISK MODELING IN PYTHON

CREDIT RISK MODELING IN PYTHON

# Import the module

array([0.92748092, 0.92575308, 0.93975392, 0.93378608, 0.93336163])

CREDIT RISK MODELING IN PYTHON

loan_status Training Data Count Percentage of Total

CREDIT RISK MODELING IN PYTHON

True loan status Predicted probability Log Loss

CREDIT RISK MODELING IN PYTHON

A $1,000 $10 Default Non-Default -$10

B $1,000 $10 Non-Default Default -$1,000

CREDIT RISK MODELING IN PYTHON

Data storage problems

Probable defaults are quickly sold to other rms

CREDIT RISK MODELING IN PYTHON

Method Pros Cons

Sample data differently Least technical adjustment Fewer defaults in data

CREDIT RISK MODELING IN PYTHON

CREDIT RISK MODELING IN PYTHON

Create two new sets based on actual loan_status

# Concat the training sets

CREDIT RISK MODELING IN PYTHON

Concatenate with data set of defaults

# Undersample the non-defaults using sample() in pandas

CREDIT RISK MODELING IN PYTHON

You might also like