0% found this document useful (0 votes)

157 views10 pages

R Code Default Data PDF

Logistic regression models were fit on a dataset containing information on customer default using different predictors. A model using balance and student (Model 3) had the best fit. This model correctly classified 97.3% of observations and had a misclassification rate of 2.68%. The area under the ROC curve was also good.

Uploaded by

Shubham Wadhwa 23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

157 views10 pages

R Code Default Data PDF

Uploaded by

Shubham Wadhwa 23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Logistic Regression

Soumya Roy

# Load the R library "ISLR"

library(ISLR)
# Attach the "Default" data set available in "ISLR" library
attach(Default)
# Name of the variables in "Default" data set
names(Default)

## [1] "default" "student" "balance" "income"

# Dimension of the "Default" data set

dim(Default)

## [1] 10000 4

# Descriptive Summary of the data set

summary(Default)

## default student balance income

## No :9667 No :7056 Min. : 0.0 Min. : 772
## Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340
## Median : 823.6 Median :34553
## Mean : 835.4 Mean :33517
## 3rd Qu.:1166.3 3rd Qu.:43808
## Max. :2654.3 Max. :73554

##Boxplots
boxplot(balance~default,col=(c("red","blue")),xlab="Default",ylab="Balance",m
ain="Balance vs Default")
boxplot(income~default,col=(c("red","blue")),xlab="Default",ylab="Balance",ma
in="Income vs Default")
## Barplot
T=table(default,student)
T

## student
## default No Yes
## No 6850 2817
## Yes 206 127

P=prop.table(T,margin=2)
P

## student
## default No Yes
## No 0.97080499 0.95686141
## Yes 0.02919501 0.04313859

barplot(P[2,],col=c("red","blue"),xlab="Student",ylab="Default Rate") #Second

Row of P gives the default rate

# Fitting a logistic regression model using the predictors "balance"

# The function "glm()" fits generalized linear models, a class of models that
includes logistic regression as a special case
# The function "glm()" is similar to that of "lm()", except that we have to p
ass on the argument "family=binomial" in order to fit a #logistic regression
model
mod_1=glm(default~balance,data=Default,family=binomial)
summary(mod_1)

##
## Call:
## glm(formula = default ~ balance, family = binomial, data = Default)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2697 -0.1465 -0.0589 -0.0221 3.7589
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.065e+01 3.612e-01 -29.49 <2e-16 ***
## balance 5.499e-03 2.204e-04 24.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2920.6 on 9999 degrees of freedom
## Residual deviance: 1596.5 on 9998 degrees of freedom
## AIC: 1600.5
##
## Number of Fisher Scoring iterations: 8
# Fitting a logistic regression model using the predictors "student"
mod_2=glm(default~student,data=Default,family=binomial)
summary(mod_2)

##
## Call:
## glm(formula = default ~ student, family = binomial, data = Default)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.2970 -0.2970 -0.2434 -0.2434 2.6585
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.50413 0.07071 -49.55 < 2e-16 ***
## studentYes 0.40489 0.11502 3.52 0.000431 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2920.6 on 9999 degrees of freedom
## Residual deviance: 2908.7 on 9998 degrees of freedom
## AIC: 2912.7
##
## Number of Fisher Scoring iterations: 6

# Fitting a logistic regression model using the predictors "balance", "studen

t", and "income"
mod_3=glm(default~balance+student+income,data=Default,family=binomial)
summary(mod_3)

##
## Call:
## glm(formula = default ~ balance + student + income, family = binomial,
## data = Default)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4691 -0.1418 -0.0557 -0.0203 3.7383
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.087e+01 4.923e-01 -22.080 < 2e-16 ***
## balance 5.737e-03 2.319e-04 24.738 < 2e-16 ***
## studentYes -6.468e-01 2.363e-01 -2.738 0.00619 **
## income 3.033e-06 8.203e-06 0.370 0.71152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2920.6 on 9999 degrees of freedom
## Residual deviance: 1571.5 on 9996 degrees of freedom
## AIC: 1579.5
##
## Number of Fisher Scoring iterations: 8

# Getting the odds ratio and their 95% CI

require(MASS)

## Loading required package: MASS

exp(cbind(coef(mod_3), confint(mod_3)))

## Waiting for profiling to be done...

## 2.5 % 97.5 %
## (Intercept) 1.903854e-05 7.074481e-06 0.0000487808
## balance 1.005753e+00 1.005309e+00 1.0062238757
## studentYes 5.237317e-01 3.298827e-01 0.8334223982
## income 1.000003e+00 9.999870e-01 1.0000191246

# Hosmer-Lemeshow Test for checking the model

library(ResourceSelection)

## ResourceSelection 0.3-5 2019-07-22

default_new=ifelse(default=="Yes", 1, 0)
hoslem.test(default_new, fitted(mod_3))

##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: default_new, fitted(mod_3)
## X-squared = 3.6823, df = 8, p-value = 0.8846

# Using the "predict()" function to obtain the probabilities of the form "P(Y
=1|X)"
# The "type=response" ensures the output of the form "P(Y=1|X)", rather than
other information such as the logit
mod_3.probs=predict(mod_3,type="response")
# Printing first ten predicted probabilities
mod_3.probs[1:10]

## 1 2 3 4 5
6
## 1.428724e-03 1.122204e-03 9.812272e-03 4.415893e-04 1.935506e-03 1.989518e
-03
## 7 8 9 10
## 2.333767e-03 1.086718e-03 1.638333e-02 2.080617e-05

# Using the "contrast()" function to check the dummy variable created by R

contrasts(default)

## Yes
## No 0
## Yes 1

# Conversion of probabilities into class labels

mod_3.pred=rep("No",10000)
mod_3.pred[mod_3.probs>.5]="Yes"

# Creating Confusion Matrix to check how many observations are correctly or i

ncorrectly classified
table(mod_3.pred,default)

## default
## mod_3.pred No Yes
## No 9627 228
## Yes 40 105

# Calculating the fraction of days for which the prediction was correct
mean(mod_3.pred==default)

## [1] 0.9732

# Calculating the misclassification rate

mean(mod_3.pred!=default)
## [1] 0.0268

# Changing the cut-off

# Conversion of probabilities into class labels
mod_3.pred=rep("No",10000)
mod_3.pred[mod_3.probs>.2]="Yes"

# Creating Confusion Matrix to check how many observations are correctly or i

ncorrectly classified
table(mod_3.pred,default)

## default
## mod_3.pred No Yes
## No 9390 130
## Yes 277 203

# Calculating the fraction of days for which the prediction was correct
mean(mod_3.pred==default)

## [1] 0.9593

# Calculating the misclassification rate

mean(mod_3.pred!=default)

## [1] 0.0407

# ROC Plot
library(pROC)

## Type 'citation("pROC")' for a citation.

##
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':

##
## cov, smooth, var

R=roc(default,mod_3.probs)

## Setting levels: control = No, case = Yes

## Setting direction: controls < cases

plot(roc(default,mod_3.probs),col="blue",legacy.axes = TRUE)

## Setting levels: control = No, case = Yes

## Setting direction: controls < cases
coords(R, "best", ret = "threshold")

## Warning in coords.roc(R, "best", ret = "threshold"): The 'transpose' argum

ent
## to FALSE by default since pROC 1.16. Set transpose = TRUE explicitly to re
vert
## to the previous behavior, or transpose = TRUE to silence this warning. Typ
e
## help(coords_transpose) for additional information.

## threshold
## 1 0.03120876

# Model Seletion
library(MASS)
stepAIC(mod_3,trace=F)

##
## Call: glm(formula = default ~ balance + student, family = binomial,
## data = Default)
##
## Coefficients:
## (Intercept) balance studentYes
## -10.749496 0.005738 -0.714878
##
## Degrees of Freedom: 9999 Total (i.e. Null); 9997 Residual
## Null Deviance: 2921
## Residual Deviance: 1572 AIC: 1578

# Lift and Gain Charts

library(funModeling)

## Loading required package: Hmisc

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## Loading required package: ggplot2

##
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':

##
## format.pval, units

## funModeling v.1.9.4 :)
## Examples and tutorials at livebook.datascienceheroes.com
## / Now in Spanish: librovivodecienciadedatos.ai

Default$mod_3.probs=predict(mod_3,type="response")
gain_lift(data=Default, score='mod_3.probs', target='default')
## Population Gain Lift Score.Point
## 1 10 78.38 7.84 0.07092534738
## 2 20 91.89 4.59 0.02104190396
## 3 30 96.70 3.22 0.00880320034
## 4 40 98.80 2.47 0.00401693056
## 5 50 99.10 1.98 0.00196619538
## 6 60 99.70 1.66 0.00094485119
## 7 70 100.00 1.43 0.00044286132
## 8 80 100.00 1.25 0.00017553872
## 9 90 100.00 1.11 0.00005139724
## 10 100 100.00 1.00 0.00001025695

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Assignment 2 Due Date: Sep 29, 2020
No ratings yet
Assignment 2 Due Date: Sep 29, 2020
2 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Submitted To:: Prof. Vinay Singh Chawan
No ratings yet
Submitted To:: Prof. Vinay Singh Chawan
12 pages
Chapter14 Multiple Regression and Correlation Analysis
100% (6)
Chapter14 Multiple Regression and Correlation Analysis
18 pages
Tarea 1 Marketing
No ratings yet
Tarea 1 Marketing
3 pages
Final Project Strategic Management and Policy
No ratings yet
Final Project Strategic Management and Policy
10 pages
Print Out
No ratings yet
Print Out
17 pages
1 Simulation Case Tri State Corp
0% (1)
1 Simulation Case Tri State Corp
2 pages
LPP Sensitivity Report
No ratings yet
LPP Sensitivity Report
7 pages
Student Sol 064 e
No ratings yet
Student Sol 064 e
98 pages
NTPC Vindhyachal-Six Sigma Project
100% (1)
NTPC Vindhyachal-Six Sigma Project
88 pages
IPL PA-nik
100% (1)
IPL PA-nik
6 pages
Prof. Gautam Sinha VGSOM, IIT Kharagpur
No ratings yet
Prof. Gautam Sinha VGSOM, IIT Kharagpur
67 pages
CompXM Study Guide
No ratings yet
CompXM Study Guide
1 page
L19 - Chi Square Test 1
No ratings yet
L19 - Chi Square Test 1
17 pages
ABB Electric Segmentation Case TEAM 4
0% (1)
ABB Electric Segmentation Case TEAM 4
11 pages
Application of Predictive Analytics in Customer Relationship Mana
No ratings yet
Application of Predictive Analytics in Customer Relationship Mana
8 pages
Capsim Success Measures
No ratings yet
Capsim Success Measures
10 pages
Retail Credit Scoring
No ratings yet
Retail Credit Scoring
9 pages
Exhibit 3 - Materials Inventory in 2010 (April 2010 - March 2011)
100% (1)
Exhibit 3 - Materials Inventory in 2010 (April 2010 - March 2011)
54 pages
Instructions - Please Read The Instructions Carefully! Contains Information About Bonuses and Penalties!!!
No ratings yet
Instructions - Please Read The Instructions Carefully! Contains Information About Bonuses and Penalties!!!
3 pages
VAR Package Pricing at Mission Hospital
No ratings yet
VAR Package Pricing at Mission Hospital
6 pages
Inventory Problems
No ratings yet
Inventory Problems
4 pages
Learning Diary - Total Rewards Dashboard
100% (1)
Learning Diary - Total Rewards Dashboard
7 pages
Texago Corporation Case Exhibit
No ratings yet
Texago Corporation Case Exhibit
7 pages
CH 032
No ratings yet
CH 032
57 pages
Metropolitan Research Inc. Case Study
No ratings yet
Metropolitan Research Inc. Case Study
6 pages
Mission Hospital Case Study
No ratings yet
Mission Hospital Case Study
1 page
IT Question Paper of EPBM 13
No ratings yet
IT Question Paper of EPBM 13
2 pages
Capstone Round 0 Report
100% (1)
Capstone Round 0 Report
16 pages
Cognizant Technology Solutions
No ratings yet
Cognizant Technology Solutions
11 pages
Case 9-2 Innovative Engineering Co
No ratings yet
Case 9-2 Innovative Engineering Co
4 pages
Case 1.2 Flexible Benefits System Implementation at Shah Alam Medical Center
100% (1)
Case 1.2 Flexible Benefits System Implementation at Shah Alam Medical Center
2 pages
SCM
No ratings yet
SCM
2 pages
Exercises On Asset Analysis
100% (1)
Exercises On Asset Analysis
2 pages
7 - Excel Cases With Anova Correlation Regression
No ratings yet
7 - Excel Cases With Anova Correlation Regression
38 pages
Monte Carlo Simulation and Queuing
No ratings yet
Monte Carlo Simulation and Queuing
11 pages
DS II Packet 2
No ratings yet
DS II Packet 2
31 pages
Assignment 2 - Solved
No ratings yet
Assignment 2 - Solved
2 pages
DS II Mid Term 2017 Solution
No ratings yet
DS II Mid Term 2017 Solution
20 pages
SEM I Protect Your Company or Cousin
No ratings yet
SEM I Protect Your Company or Cousin
9 pages
Portland Trail Blazers - Samarth Shah
No ratings yet
Portland Trail Blazers - Samarth Shah
4 pages
Soal Capital Budgeting Chapter 11
No ratings yet
Soal Capital Budgeting Chapter 11
1 page
Operations Research-Sec D
No ratings yet
Operations Research-Sec D
5 pages
Final Exam 2023 Corporate Valuation
No ratings yet
Final Exam 2023 Corporate Valuation
5 pages
This Study Resource Was: F (Q) Co Cu+Co
No ratings yet
This Study Resource Was: F (Q) Co Cu+Co
8 pages
Comparative IndCo Challenges of Designing and Implementing Customized Training Analysis
No ratings yet
Comparative IndCo Challenges of Designing and Implementing Customized Training Analysis
3 pages
Scalene Works-HR Analytics
0% (1)
Scalene Works-HR Analytics
10 pages
FWD: Customer-Centric Marketing IN Online Insurance
No ratings yet
FWD: Customer-Centric Marketing IN Online Insurance
22 pages
Chapters 9 and 10 Edited
No ratings yet
Chapters 9 and 10 Edited
18 pages
Problem Set 2 3 4 Or-2
100% (1)
Problem Set 2 3 4 Or-2
5 pages
The Spring Field Nor
No ratings yet
The Spring Field Nor
15 pages
WG 1 Sep7 Problem Set
No ratings yet
WG 1 Sep7 Problem Set
2 pages
Mountain Man Brewing Bringing Brand To Light Marketing Essay
No ratings yet
Mountain Man Brewing Bringing Brand To Light Marketing Essay
7 pages
BM Case
No ratings yet
BM Case
6 pages
01 TShirt Sales Finished
No ratings yet
01 TShirt Sales Finished
7 pages
Linear Optimization-7-7-17
No ratings yet
Linear Optimization-7-7-17
35 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
No ratings yet
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
16 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Lab 4 Classification v.0
No ratings yet
Lab 4 Classification v.0
5 pages
07 GLM
No ratings yet
07 GLM
49 pages
Chapter 7-Tahoe-Salt
No ratings yet
Chapter 7-Tahoe-Salt
14 pages
CP 1 - Abigail Olivia Angelica Davidson - TP062624
No ratings yet
CP 1 - Abigail Olivia Angelica Davidson - TP062624
37 pages
Classification of Soil Fertility Using Machine Learning-Based Classifier
No ratings yet
Classification of Soil Fertility Using Machine Learning-Based Classifier
6 pages
Inbound 3991216296804003764
No ratings yet
Inbound 3991216296804003764
15 pages
Chapter 3
No ratings yet
Chapter 3
11 pages
Review of Sessions 1-7 PUBH 614 Spring 2019
No ratings yet
Review of Sessions 1-7 PUBH 614 Spring 2019
68 pages
Absorption of Carbon Dioxide From The Atmosphere: Dr. Jarl Ahlbeck
No ratings yet
Absorption of Carbon Dioxide From The Atmosphere: Dr. Jarl Ahlbeck
18 pages
Time Series Analysis and Forecasting With Minitab
0% (1)
Time Series Analysis and Forecasting With Minitab
3 pages
Newbold Ism 07 PDF
No ratings yet
Newbold Ism 07 PDF
19 pages
The Sampling Error in Estimates of Variance
No ratings yet
The Sampling Error in Estimates of Variance
17 pages
Wright Bradley E
No ratings yet
Wright Bradley E
20 pages
Linear Regression Excel Example
No ratings yet
Linear Regression Excel Example
3 pages
(Ebook PDF) Mind On Statistics: Australian & New Zealand 2nd All Chapter Instant Download
100% (9)
(Ebook PDF) Mind On Statistics: Australian & New Zealand 2nd All Chapter Instant Download
19 pages
Analysis of Incomplete Block. Designs With Reference Samples in Every Block
No ratings yet
Analysis of Incomplete Block. Designs With Reference Samples in Every Block
6 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
48 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Journal of Business Research: Reinhard Grohs, Heribert Reisinger
No ratings yet
Journal of Business Research: Reinhard Grohs, Heribert Reisinger
8 pages
Model Weight (LBS) Price ($)
No ratings yet
Model Weight (LBS) Price ($)
14 pages
(Ebook PDF) Business Statistics 4th Edition by Norean D. Sharpeinstant Download
No ratings yet
(Ebook PDF) Business Statistics 4th Edition by Norean D. Sharpeinstant Download
52 pages
Appliedsem Amos
No ratings yet
Appliedsem Amos
367 pages
Intermediate Statistics Sample Test 1
0% (3)
Intermediate Statistics Sample Test 1
17 pages
Principles of Statistical Inference
100% (10)
Principles of Statistical Inference
236 pages
Weibull Distribution
No ratings yet
Weibull Distribution
29 pages
EE4 Ch04 Solutions Manual
No ratings yet
EE4 Ch04 Solutions Manual
12 pages
Introduction To R and R Commander
No ratings yet
Introduction To R and R Commander
38 pages
Quiz 2
100% (1)
Quiz 2
9 pages
Errors and Stat Data in Chemical Analysis
No ratings yet
Errors and Stat Data in Chemical Analysis
20 pages

R Code Default Data PDF

Uploaded by

R Code Default Data PDF

Uploaded by

Logistic Regression

# Load the R library "ISLR"

## [1] "default" "student" "balance" "income"

# Dimension of the "Default" data set

# Descriptive Summary of the data set

## default student balance income

barplot(P[2,],col=c("red","blue"),xlab="Student",ylab="Default Rate") #Second

# Fitting a logistic regression model using the predictors "balance"

# Fitting a logistic regression model using the predictors "balance", "studen

# Getting the odds ratio and their 95% CI

## Loading required package: MASS

## Waiting for profiling to be done...

# Hosmer-Lemeshow Test for checking the model

## ResourceSelection 0.3-5 2019-07-22

# Using the "contrast()" function to check the dummy variable created by R

# Conversion of probabilities into class labels

# Creating Confusion Matrix to check how many observations are correctly or i

# Calculating the misclassification rate

# Changing the cut-off

# Creating Confusion Matrix to check how many observations are correctly or i

# Calculating the misclassification rate

## Type 'citation("pROC")' for a citation.

## The following objects are masked from 'package:stats':

## Setting levels: control = No, case = Yes

## Setting direction: controls < cases

## Setting levels: control = No, case = Yes

## Warning in coords.roc(R, "best", ret = "threshold"): The 'transpose' argum

# Lift and Gain Charts

## Loading required package: Hmisc

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## Loading required package: ggplot2

## The following objects are masked from 'package:base':

You might also like